Proteases

ABSTRACT

The invention provides human proteases (PRTS) and polynucleotides which identify and encode PRTS. The invention also provides expression vectors, host cells, antibodies, agonists, and antagonists. The invention also provides methods for diagnosing, treating, or preventing disorders associated with aberrant expression of PRTS.

[0001] This application is a continuation application of PCT applicationPCT/US01/22397, filed Jul. 17, 2001 and published in English as WO02/08396 on Jan. 31, 2002, which claims the benefit of provisionalapplications U.S. Ser. No. 60/220,063, filed Jul. 21, 2000; U.S. Ser.No. 60/221,680, filed Jul. 28, 2000; U.S. Ser. No. 60/223,544, filedAug. 4, 2000; U.S. Ser. No. 60/224,717, filed Aug. 11, 2000; U.S. Ser.No. 60/225,988, filed Aug. 16, 2000; and U.S. Ser. No. 60/227,568, filedAug. 23, 2000, all of which applications and patents are herebyincorporated herein by reference.

TECHNICAL FIELD

[0002] This invention relates to nucleic acid and amino acid sequencesof proteases and to the use of these sequences in the diagnosis,treatment, and prevention of gastrointestinal, cardiovascular,autoimmune/inflammatory, cell proliferative, developmental, epithelial,neurological, and reproductive disorders, and in the assessment of theeffects of exogenous compounds on the expression of nucleic acid andamino acid sequences of proteases.

BACKGROUND OF THE INVENTION

[0003] Proteases cleave proteins and peptides at the peptide bond thatforms the backbone of the protein or peptide chain. Proteolysis is oneof the most important and frequent enzymatic reactions that occurs bothwithin and outside of cells. Proteolysis is responsible for theactivation and maturation of nascent polypeptides, the degradation ofmisfolded and damaged proteins, and the controlled turnover of peptideswithin the cell. Proteases participate in digestion, endocrine function,and tissue remodeling during embryonic development, wound healing, andnormal growth. Proteases can play a role in regulatory processes byaffecting the half life of regulatory proteins. Proteases are involvedin the etiology or progression of disease states such as inflammation,angiogenesis, tumor dispersion and metastasis, cardiovascular disease,neurological disease, and bacterial, parasitic, and viral infections.

[0004] Proteases can be categorized on the basis of where they cleavetheir substrates. Exopeptidases, which include aminopeptidases,dipeptidyl peptidases, tripeptidases, carboxypeptidases,peptidyl-di-peptidases, dipeptidases, and omega peptidases, cleaveresidues at the termini of their substrates. Endopeptidases, includingserine proteases, cysteine proteases, and metalloproteases, cleave atresidues within the peptide. Four principal categories of mammalianproteases have been identified based on active site structure, mechanismof action, and overall three-dimensional structure. (See Beynon, R. J.and J. S. Bond (1994) Proteolytic Enzymes: A Practical Approach, OxfordUniversity Press, New York N.Y., pp. 1-5.)

[0005] Serine Proteases

[0006] The serine proteases (SPs) are a large, widespread family ofproteolytic enzymes that include the digestive enzymes trypsin andchymotrypsin, components of the complement and blood-clotting cascades,and enzymes that control the degradation and turnover of macromoleculeswithin the cell and in the extracellular matrix. Most of the more than20 subfamilies can be grouped into six clans, each with a commonancestor. These six clans are hypothesized to have descended from atleast four evolutionarily distinct ancestors. SPs are named for thepresence of a serine residue found in the active catalytic site of mostfamilies. The active site is defined by the catalytic triad, a set ofconserved asparagine, histidine, and serine residues critical forcatalysis. These residues form a charge relay network that facilitatessubstrate binding. Other residues outside the active site form anoxyanion hole that stabilizes the tetrahedral transition intermediateformed during catalysis. SPs have a wide range of substrates and can besubdivided into subfamilies on the basis of their substrate specificity.The main subfamilies are named for the residue(s) after which theycleave: trypases (after arginine or lysine), aspases (after aspartate),chymases (after phenylalanine or leucine), metases (methionine), andserases (after serine) (Rawlings, N. D. and A. J. Barrett (1994) MethodsEnzymol. 244:19-61).

[0007] Most mammalian serine proteases are synthesized as zymogens,inactive precursors that are activated by proteolysis. For example,trypsinogen is converted to its active form, trypsin, byenteropeptidase. Enteropeptidase is an intestinal protease that removesan N-terminal fragment from trypsinogen. The remaining active fragmentis trypsin, which in turn activates the precursors of the otherpancreatic enzymes. Likewise, proteolysis of prothrombin, the precursorof thrombin, generates three separate polypeptide fragments. TheN-terminal fragment is released while the other two fragments, whichcomprise active thrombin, remain associated through disulfide bonds.

[0008] The two largest SP subfamilies are the chymotrypsin (SI) andsubtilisin (S8) families. Some members of the chymotrypsin familycontain two structural domains unique to this family. Kringle domainsare triple-looped, disulfide cross-linked domains found in varying copynumber. Kringles are thought to play a role in binding mediators such asmembranes, other proteins or phospholipids, and in the regulation ofproteolytic activity (PROSITE PDOC00020). Apple domains are 90amino-acid repeated domains, each containing six conserved cysteines.Three disulfide bonds link the first and sixth, second and fifth, andthird and fourth cysteines (PROSITE PDOC00376). Apple domains areinvolved in protein-protein interactions. SI family members includetrypsin, chymotrypsin, coagulation factors IX-XII, complement factors B,C, and D, granzymes, kallikrein, and tissue- and urokinase-plasminogenactivators. The subtilisin family has members found in the eubacteria,archaebacteria, eukaryotes, and viruses. Subtilisins include theproprotein-processing endopeptidases kexin and firin and the pituitaryprohormone convertases PC1, PC2, PC3, PC6, and PACE4 (Rawlings andBarrett, supra).

[0009] SPs have functions in many normal processes and some have beenimplicated in the etiology or treatment of disease. Enterokinase, theinitiator of intestinal digestion, is found in the intestinal brushborder, where it cleaves the acidic propeptide from trypsinogen to yieldactive trypsin (Kitamoto, Y. et al. (1994) Proc. Natl. Acad. Sci. USA91:7588-7592). Prolylcarboxypeptidase, a lysosomal serine peptidase thatcleaves peptides such as angiotensin II and III and [des-Arg9]bradykinin, shares sequence homology with members of both the serinecarboxypeptidase and prolylendopeptidase families (Tan, F. et al. (1993)J. Biol. Chem. 268:16631-16638). The protease neuropsin may influencesynapse formation and neuronal connectivity in the hippocampus inresponse to neural signaling (Chen, Z.-L. et al. (1995) J. Neurosci.15:5088-5097). Tissue plasminogen activator is useful for acutemanagement of stroke (Zivin, J. A. (1999) Neurology 53:14-19) andmyocardial infarction (Ross, A.M. (1999) Clin. Cardiol. 22:165-171).Some receptors (PAR, for proteinase-activated receptor), highlyexpressed throughout the digestive tract, are activated by proteolyticcleavage of an extracellular domain. The major agonists for PARs,thrombin, trypsin, and mast cell tryptase, are released in allergy andinflammatory conditions. Control of PAR activation by proteases has beensuggested as a promising therapeutic target (Vergnolle, N. (2000)Aliment. Pharmacol. Ther. 14:257-266; Rice, K. D. et al. (1998) Curr.Pharm. Des. 4:381-396). Prostate-specific antigen (PSA) is akallikrein-like serine protease synthesized and secreted exclusively byepithelial cells in the prostate gland. Serum PSA is elevated inprostate cancer and is the most sensitive physiological marker formonitoring cancer progression and response to therapy. PSA can alsoidentify the prostate as the origin of a metastatic tumor (Brawer, M. K.and P. H. Lange (1989) Urology 33:11-16).

[0010] The signal peptidase is a specialized class of SP found in allprokaryotic and eukaryotic cell types that serves in the processing ofsignal peptides from certain proteins. Signal peptides areamino-terminal domains of a protein which direct the protein from itsribosomal assembly site to a particular cellular or extracellularlocation. Once the protein has been exported, removal of the signalsequence by a signal peptidase and posttranslational processing, e.g.,glycosylation or phosphorylation, activate the protein. Signalpeptidases exist as multi-subunit complexes in both yeast and mammals.The canine signal peptidase complex is composed of five subunits, allassociated with the microsomal membrane and containing hydrophobicregions that span the membrane one or more times (Shelness, G. S. and G.Blobel (1990) J. Biol. Chem. 265:9512-9519). Some of these subunitsserve to fix the complex in its proper position on the membrane whileothers contain the actual catalytic activity.

[0011] Another family of proteases which have a serine in their activesite are dependent on the hydrolysis of ATP for their activity. Theseproteases contain proteolytic core domains and regulatory ATPase domainswhich can be identified by the presence of the P-loop, anATP/GTP-binding motif (PROSITE PDOC00803). Members of this familyinclude the eukaryotic mitochondrial matrix proteases, Clp protease andthe proteasome. Clp protease was originally found in plant chloroplastsbut is believed to be widespread in both prokaryotic and eukaryoticcells. The gene for early-onset torsion dystonia encodes a proteinrelated to Clp protease (Ozelius, L. J. et al. (1998) Adv. Neurol.78:93-105).

[0012] The proteasome is an intracellular protease complex found in somebacteria and in all eukaryotic cells, and plays an important role incellular physiology. Proteasomes are associated with the ubiquitinconjugation system (UCS), a major pathway for the degradation ofcellular proteins of all types, including proteins that function toactivate or repress cellular processes such as transcription and cellcycle progression (Ciechanover, A. (1994) Cell 79:13-21). In the UCSpathway, proteins targeted for degradation are conjugated to ubiquitin,a small heat stable protein. The ubiquitinated protein is thenrecognized and degraded by the proteasome. The resultantubiquitin-peptide complex is hydrolyzed by a ubiquitin carboxyl terminalhydrolase, and free ubiquitin is released for reutilization by the UCS.Ubiquitin-proteasome systems are implicated in the degradation ofmitotic cyclic kinases, oncoproteins, tumor suppressor genes (p53), cellsurface receptors associated with signal transduction, transcriptionalregulators, and mutated or damaged proteins (Ciechanover, supra). Thispathway has been implicated in a number of diseases, including cysticfibrosis, Angelman's syndrome, and Liddle syndrome (reviewed inSchwartz, A. L. and A. Ciechanover (1999) Annu. Rev. Med. 50:57-74). Amurine proto-oncogene, Unp, encodes a nuclear ubiquitin protease whoseoverexpression leads to oncogenic transformation of NIH3T3 cells. Thehuman homologue of this gene is consistently elevated in small celltumors and adenocarcinomas of the lung (Gray, D. A. (1995) Oncogene10:2179-2183). Ubiquitin carboxyl terminal hydrolase is involved in thedifferentiation of a lymphoblastic leukemia cell line to a non-dividingmature state (Maki, A. et al. (1996) Differentiation 60:59-66). Inneurons, ubiquitin carboxyl terminal hydrolase (PGP 9.5) expression isstrong in the abnormal structures that occur in human neurodegenerativediseases (Lowe, J. et al. (1990) J. Pathol. 161:153-160). The proteasomeis a large (2000 kDa) multisubunit complex composed of a centralcatalytic core containing a variety of proteases arranged in fourseven-membered rings with the active sites facing inwards into thecentral cavity, and terminal ATPase subunits covering the outer port ofthe cavity and regulating substrate entry (for review, see Schmidt, M.et al. (1999) Curr. Opin. Chem. Biol. 3:584-591).

[0013] Cysteine Proteases

[0014] Cysteine proteases (CPs) are involved in diverse cellularprocesses ranging from the processing of precursor proteins tointracellular degradation. Nearly half of the CPs known are present onlyin viruses. CPs have a cysteine as the major catalytic residue at theactive site where catalysis proceeds via a thioester intermediate and isfacilitated by nearby histidine and asparagine residues. A glutamineresidue is also important, as it helps to form an oxyanion hole. Twoimportant CP families include the papain-like enzymes (C1) and thecalpains (C2). Papain-like family members are generally lysosomal orsecreted and therefore are synthesized with signal peptides as well aspropeptides. Most members bear a conserved motif in the propeptide thatmay have structural significance (Karrer, K. M. et al. (1993) Proc.Natl. Acad. Sci. USA 90:3063-3067). Three-dimensional structures ofpapain family members show a bilobed molecule with the catalytic sitelocated between the two lobes. Papains include cathepsins B, C, H, L,and S, certain plant allergens and dipeptidyl peptidase (for a review,see Rawlings, N. D. and A. J. Barrett (1994) Methods Enzymol.244:461-486).

[0015] Some CPs are expressed ubiquitously, while others are producedonly by cells of the immune system. Of particular note, CPs are producedby monocytes, macrophages and other cells which migrate to sites ofinflammation and secrete molecules involved in tissue repair.Overabundance of these repair molecules plays a role in certaindisorders. In autoimmune diseases such as rheumatoid arthritis,secretion of the cysteine peptidase cathepsin C degrades collagen,laminin, elastin and other structural proteins found in theextracellular matrix of bones. Bone weakened by such degradation is alsomore susceptible to tumor invasion and metastasis. Cathepsin Lexpression may also contribute to the influx of mononuclear cells whichexacerbates the destruction of the rheumatoid synovium (Keyszer, G. M.(1995) Arthritis Rheum. 38:976-984).

[0016] Calpains are calcium-dependent cytosolic endopeptidases whichcontain both an N-terminal catalytic domain and a C-terminalcalcium-binding domain. Calpain is expressed as a proenzyme heterodimerconsisting of a catalytic subunit unique to each isoform and aregulatory subunit common to different isoforms. Each subunit bears acalcium-binding EF-hand domain. The regulatory subunit also contains ahydrophobic glycine-rich domain that allows the enzyme to associate withcell membranes. Calpains are activated by increased intracellularcalcium concentration, which induces a change in conformation andlimited autolysis. The resultant active molecule requires a lowercalcium concentration for its activity (Chan, S. L. and M. P. Mattson(1999) J. Neurosci. Res. 58:167-190). Calpain expression ispredominantly neuronal, although it is present in other tissues. Severalchronic neurodegenerative disorders, including ALS, Parkinson's diseaseand Alzheimer's disease are associated with increased calpain expression(Chan and Mattson, supra). Calpain-mediated breakdown of thecytoskeleton has been proposed to contribute to brain damage resultingfrom head injury (McCracken, E. et al. (1999) J. Neurotrauma16:749-761). Calpain-3 is predominantly expressed in skeletal muscle,and is responsible for limb-girdle muscular dystrophy type 2A (Minami,N. et al. (1999) J. Neurol. Sci. 171:31-37).

[0017] Another family of thiol proteases is the caspases, which areinvolved in the initiation and execution phases of apoptosis. Apro-apoptotic signal can activate initiator caspases that trigger aproteolytic caspase cascade, leading to the hydrolysis of targetproteins and the classic apoptotic death of the cell. Two active siteresidues, a cysteine and a histidine, have been implicated in thecatalytic mechanism. Caspases are among the most specificendopeptidases, cleaving after aspartate residues. Caspases aresynthesized as inactive zymogens consisting of one large (p20) and onesmall (p10) subunit separated by a small spacer region, and a variableN-terminal prodomain. This prodomain interacts with cofactors that canpositively or negatively affect apoptosis. An activating signal causesautoproteolytic cleavage of a specific aspartate residue (D297 in thecaspase-1 numbering convention) and removal of the spacer and prodomain,leaving a p10/p20 heterodimer. Two of these heterodimers interact viatheir small subunits to form the catalytically active tetramer. The longprodomains of some caspase family members have been shown to promotedimerization and auto-processing of procaspases. Some caspases contain a“death effector domain” in their prodomain by which they can berecruited into self-activating complexes with other caspases and FADDprotein associated death receptors or the TNF receptor complex. Inaddition, two dimers from different caspase family members canassociate, changing the substrate specificity of the resultant tetramer.Endogenous caspase inhibitors (inhibitor of apoptosis proteins, or IAPs)also exist. All these interactions have clear effects on the control ofapoptosis (reviewed in Chan and Mattson, supra; Salveson, G. S. and V.M. Dixit (1999) Proc. Natl. Acad. Sci. USA 96:10964-10967).

[0018] Caspases have been implicated in a number of diseases. Micelacking some caspases have severe nervous system defects due to failedapoptosis in the neuroepithelium and suffer early lethality. Others showsevere defects in the inflammatory response, as caspases are responsiblefor processing IL-1b and possibly other inflammatory cytokines (Chan andMattson, supra). Cowpox virus and baculoviruses target caspases to avoidthe death of their host cell and promote successful infection. Inaddition, increases in inappropriate apoptosis have been reported inAIDS, neurodegenerative diseases and ischemic injury, while a decreasein cell death is associated with cancer (Salveson and Dixit, supra;Thompson, C. B. (1995) Science 267:1456-1462).

[0019] Aspartyl Proteases

[0020] Aspartyl proteases (APs) include the lysosomal proteasescathepsins D and E, as well as chymosin, renin, and the gastric pepsins.Most retroviruses encode an AP, usually as part of the Rol polyprotein.APs, also called acid proteases, are monomeric enzymes consisting of twodomains, each domain containing one half of the active site with its owncatalytic aspartic acid residue. APs are most active in the range of pH2-3, at which one of the aspartate residues is ionized and the otherneutral. The pepsin family of APs contains many secreted enzymes, andall are likely to be synthesized with signal peptides and propeptides.Most family members have three disulfide loops, the first ˜5 residueloop following the first aspartate, the second 5-6 residue looppreceding the second aspartate, and the third and largest loop occurringtoward the C terminus. Retropepsins, on the other hand, are analogous toa single domain of pepsin, and become active as homodimers with eachretropepsin monomer contributing one half of the active site.Retropepsins are required for processing the viral polyproteins.

[0021] APs have roles in various tissues, and some have been associatedwith disease. Renin mediates the first step in processing the hormoneangiotensin, which is responsible for regulating electrolyte balance andblood pressure (reviewed in Crews, D. E. and S. R. Williams (1999) Hum.Biol. 71:475-503). Abnormal regulation and expression of cathepsins areevident in various inflammatory disease states. Expression of cathepsinD is elevated in synovial tissues from patients with rheumatoidarthritis and osteoarthritis. The increased expression and differentialregulation of the cathepsins are linked to the metastatic potential of avariety of cancers (Chambers, A. F. et al. (1993) Crit. Rev. Oncol.4:95-114).

[0022] Metalloproteases

[0023] Metalloproteases require a metal ion for activity, usuallymanganese or zinc. Examples of manganese metalloenzymes includeaminopeptidase P and human proline dipeptidase (PEPD). Aminopeptidase Pcan degrade bradykinin, a nonapeptide activated in a variety ofinflammatory responses. Aminopeptidase P has been implicated in coronaryischemia/reperfusion injury. Administration of aminopeptidase Pinhibitors has been shown to have a cardioprotective effect in rats(Ersahin, C. et al (1999) J. Cardiovasc. Pharmacol. 34:604-611).

[0024] Most zinc-dependent metalloproteases share a common sequence inthe zinc-binding domain. The active site is made up of two histidineswhich act as zinc ligands and a catalytic glutamic acid C-terminal tothe first histidine. Proteins containing this signature sequence areknown as the metzincins and include aminopeptidase N,angiotensin-converting enzyme, neurolysin, the matrix metalloproteasesand the adamalysins (ADAMS). An alternate sequence is found in the zinccarboxypeptidases, in which all three conserved residues—two histidinesand a glutamic acid—are involved in zinc binding.

[0025] A number of the neutral metalloendopeptidases, includingangiotensin converting enzyme and the aminopeptidases, are involved inthe metabolism of peptide hormones. High aminopeptidase B activity, forexample, is found in the adrenal glands and neurohypophyses ofhypertensive rats (Prieto, I. et al. (1998) Horm. Metab. Res.30:246-248). Oligopeptidase M/neurolysin can hydrolyze bradykinin aswell as neurotensin (Serizawa, A. et al. (1995) J. Biol. Chem270:2092-2098). Neurotensin is a vasoactive peptide that can act as aneurotransmitter in the brain, where it has been implicated in limitingfood intake (Tritos, N. A. et al. (1999) Neuropeptides 33:339-349).

[0026] The matrix metalloproteases (MMPs) are a family of at least 23enzymes that can degrade components of the extracellular matrix (ECM).They are Zn⁺² endopeptidases with an N-terminal catalytic domain. Nearlyall members of the family have a hinge peptide and C-terminal domainwhich can bind to substrate molecules in the ECM or to inhibitorsproduced by the tissue (TIMPs, for tissue inhibitor of metalloprotease;Campbell, I. L. et al. (1999) Trends Neurosci. 22:285). The presenceoffibronectin-like repeats, transmembrane domains, or C-terminalhemopexinase-like domains can be used to separate MMPs into collagenase,gelatinase, stromelysin and membrane-type MMP subfamilies. In theinactive form, the Zn⁺² ion in the active site interacts with a cysteinein the pro-sequence. Activating factors disrupt the Zn⁺²-cysteineinteraction, or “cysteine switch,” exposing the active site. Thispartially activates the enzyme, which then cleaves off its propeptideand becomes fully active. MMPs are often activated by the serineproteases plasmin and furin. MMPs are often regulated by stoichiometric,noncovalent interactions with inhibitors; the balance of protease toinhibitor, then, is very important in tissue homeostasis (reviewed inYong, V. W. et al. (1998) Trends Neurosci. 21:75).

[0027] MMPs are implicated in a number of diseases includingosteoarthritis (Mitchell, P. et al. (1996) J. Clin. Invest. 97:761),atherosclerotic plaque rupture (Sukhova, G. K. et al. (1999) Circulation99:2503), aortic aneurysm (Schneiderman, J. et al. (1998) Am. J. Path.152:703), non-healing wounds (Saarialho-Kere, U. K. et al. (1994) J.Clin. Invest. 94:79), bone resorption (Blavier, L. and J. M. Delaisse(1995) J. Cell Sci. 108:3649), age-related macular degeneration (Steen,B. et al. (1998) Invest. Ophthalmol. Vis. Sci. 39:2194), emphysema(Finlay, G. A. et al. (1997) Thorax 52:502), myocardial infarction(Rohde, L. E. et al. (1999) Circulation 99:3063) and dilatedcardiomyopathy (Thomas, C. V. et al. (1998) Circulation 97:1708). MMPinhibitors prevent metastasis of mammary carcinoma and experimentaltumors in rat, and Lewis lung carcinoma, hemangioma, and human ovariancarcinoma xenografts in mice (Eccles, S. A. et al. (1996) Cancer Res.56:2815; Anderson et al. (1996) Cancer Res. 56:715-718; Volpert, O. V.et al. (1996) J. Clin. Invest. 98:671; Taraboletti, G. et al. (1995) J.NCI 87:293; Davies, B. et al. (1993) Cancer Res. 53:2087). MMPs may beactive in Alzheimer's disease. A number of MMPs are implicated inmultiple sclerosis, and administration of MMP inhibitors can relievesome of its symptoms (reviewed in Yong, supra).

[0028] Another family of metalloproteases is the ADAMs, for ADisintegrin and Metalloprotease Domain, which they share with theirclose relatives the adamalysins, snake venom metalloproteases (SVMPs).ADAMs combine features of both cell surface adhesion molecules andproteases, containing a prodomain, a protease domain, a disintegrindomain, a cysteine rich domain, an epidermal growth factor repeat, atransmembrane domain, and a cytoplasmic tail. The first three domainslisted above are also found in the SVMPs. The ADAMs possess fourpotential functions: proteolysis, adhesion, signaling and fusion. TheADAMs share the metzincin zinc binding sequence and are inhibited bysome MMP antagonists such as TIMP-1.

[0029] ADAMs are implicated in such processes as sperm-egg binding andfusion, myoblast fusion, and protein-ectodomain processing or sheddingof cytokines, cytokine receptors, adhesion proteins and otherextracellular protein domains (Schlöndorff, J. and C. P. Blobel (1999)J. Cell. Sci. 112:3603-3617). The Kuzbanian protein cleaves a substratein the NOTCH pathway (possibly NOTCH itself), activating the program forlateral inhibition in Drosophila neural development. Two ADAMs, TACE(ADAM 17) and ADAM 10, are proposed to have analogous roles in theprocessing of amyloid precursor protein in the brain (Schlondorff andBlobel, supra). TACE has also been identified as the TNF activatingenzyme (Black, R. A. et al. (1997) Nature 385:729). TNF is a pleiotropiccytokine that is important in mobilizing host defenses in response toinfection or trauma, but can cause severe damage in excess and is oftenoverproduced in autoimmune disease. TACE cleaves membrane-bound pro-TNFto release a soluble form. Other ADAMs may be involved in a similar typeof processing of other membrane-bound molecules. MADDAM (formetalloprotease and disintegrin dendritic antigen marker), a member ofthe ADAM19 family, is up-regulated in monocytes induced to becomedendritic cells. It is useful as a marker for distinguishing betweendendritic cells and macrophages (Fritsche, J. et al. (2000) Blood96:732-739).

[0030] The ADAMTS sub-family has all of the features of ADAM familymetalloproteases and contain an additional thrombospondin domain (TS).The prototypic ADAMTS was identified in mouse, found to be expressed inheart and kidney and upregulated by proinflammatory stimuli (Kuno, K. etal. (1997) J. Biol. Chem. 272:556-562). To date eleven members arerecognized by the Human Genome Organization (HUGO;http://www.gene.ucl.ac.uk/usersihester/adamts.html#Approved). Members ofthis family have the ability to degrade aggrecan, a high molecularweight proteoglycan which provides cartilage with important mechanicalproperties including compressibility, and which is lost during thedevelopment of arthritis. Enzymes which degrade aggrecan are thusconsidered attractive targets to prevent and slow the degradation ofarticular cartilage (See, e.g., Tortorella, M. D. (1999) Science284:1664; Abbaszade, I. (1999) J. Biol. Chem. 274:23443). Other membersare reported to have antiangiogenic potential (Kuno et al., supra)and/or procollagen processing (Colige, A. et al. (1997) Proc. Natl.Acad. Sci. USA 94:2374).

[0031] The discovery of new proteases, and the polynucleotides encodingthem, satisfies a need in the art by providing new compositions whichare useful in the diagnosis, prevention, and treatment ofgastrointestinal, cardiovascular, autoimmune/inflammatory, cellproliferative, developmental, epithelial, neurological, and reproductivedisorders, and in the assessment of the effects of exogenous compoundson the expression of nucleic acid and amino acid sequences of proteases.

SUMMARY OF THE INVENTION

[0032] The invention features purified polypeptides, proteases, referredto collectively as “PRTS” and individually as “PRTS-1,” “PRTS-2,”“PRTS-3,” “PRTS-4,” “PRTS-5,” “PRTS-6,” “PRTS-7,” “PRTS-8,” “PRTS-9,”“PRTS-10,” “PRTS-11,” “PRTS-12,” “PRTS-13,” “PRTS-14,” “PRTS-15,”“PRTS-16,” “PRTS-17,” “PRTS-18,” “PRTS-19,” “PRTS-20,” and “PRTS-21.” Inone aspect, the invention provides an isolated polypeptide selected fromthe group consisting of a) a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, b) apolypeptide comprising a naturally occurring amino acid sequence atleast 90% identical to an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, c) a biologically active fragment of apolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, and d) an immunogenic fragment of apolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21. In one alternative, the invention providesan isolated polypeptide comprising the amino acid sequence of SEQ IDNO:1-21.

[0033] The invention further provides an isolated polynucleotideencoding a polypeptide selected from the group consisting of a) apolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, b) a polypeptide comprising a naturallyoccurring amino acid sequence at least 90% identical to an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, c) abiologically active fragment of a polypeptide having an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, and d) animmunogenic fragment of a polypeptide having an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21. In onealternative, the polynucleotide encodes a polypeptide selected from thegroup consisting of SEQ ID NO:1-21. In another alternative, thepolynucleotide is selected from the group consisting of SEQ ID NO:22-42.

[0034] Additionally, the invention provides a recombinant polynucleotidecomprising a promoter sequence operably linked to a polynucleotideencoding a polypeptide selected from the group consisting of a) apolypeptide comprising an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, b) a polypeptide comprising a naturallyoccurring amino acid sequence at least 90% identical to an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, c) abiologically active fragment of a polypeptide having an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, and d) animmunogenic fragment of a polypeptide having an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21. In onealternative, the invention provides a cell transformed with therecombinant polynucleotide. In another alternative, the inventionprovides a transgenic organism comprising the recombinantpolynucleotide.

[0035] The invention also provides a method for producing a polypeptideselected from the group consisting of a) a polypeptide comprising anamino acid sequence selected from the group consisting of SEQ IDNO:1-21, b) a polypeptide comprising a naturally occurring amino acidsequence at least 90% identical to an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-21, c) a biologically activefragment of a polypeptide having an amino acid sequence selected fromthe group consisting of SEQ ID NO:1-21, and d) an immunogenic fragmentof a polypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21. The method comprises a) culturing a cellunder conditions suitable for expression of the polypeptide, whereinsaid cell is transformed with a recombinant polynucleotide comprising apromoter sequence operably linked to a polynucleotide encoding thepolypeptide, and b) recovering the polypeptide so expressed.

[0036] Additionally, the invention provides an isolated antibody whichspecifically binds to a polypeptide selected from the group consistingof a) a polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NO:1-21, b) a polypeptide comprising anaturally occurring amino acid sequence at least 90% identical to anamino acid sequence selected from the group consisting of SEQ IDNO:1-21, c) a biologically active fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21, and d) an immunogenic fragment of a polypeptide having an aminoacid sequence selected from the group consisting of SEQ ID NO:1-21.

[0037] The invention further provides an isolated polynucleotideselected from the group consisting of a) a polynucleotide comprising apolynucleotide sequence selected from the group consisting of SEQ IDNO:22-42, b) a polynucleotide comprising a naturally occurringpolynucleotide sequence at least 90% identical to a polynucleotidesequence selected from the group consisting of SEQ ID NO:22-42, c) apolynucleotide complementary to the polynucleotide of a), d) apolynucleotide complementary to the polynucleotide of b), and e) an RNAequivalent of a)-d). In one alternative, the polynucleotide comprises atleast 60 contiguous nucleotides.

[0038] Additionally, the invention provides a method for detecting atarget polynucleotide in a sample, said target polynucleotide having asequence of a polynucleotide selected from the group consisting of a) apolynucleotide comprising a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:22-42, b) a polynucleotide comprising anaturally occurring polynucleotide sequence at least 90% identical to apolynucleotide sequence selected from the group consisting of SEQ IDNO:22-42, c) a polynucleotide complementary to the polynucleotide of a),d) a polynucleotide complementary to the polynucleotide of b), and e) anRNA equivalent of a)-d). The method comprises a) hybridizing the samplewith a probe comprising at least 20 contiguous nucleotides comprising asequence complementary to said target polynucleotide in the sample, andwhich probe specifically hybridizes to said target polynucleotide, underconditions whereby a hybridization complex is formed between said probeand said target polynucleotide or fragments thereof, and b) detectingthe presence or absence of said hybridization complex, and optionally,if present, the amount thereof. In one alternative, the probe comprisesat least 60 contiguous nucleotides.

[0039] The invention further provides a method for detecting a targetpolynucleotide in a sample, said target polynucleotide having a sequenceof a polynucleotide selected from the group consisting of a) apolynucleotide comprising a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:22-42, b) a polynucleotide comprising anaturally occurring polynucleotide sequence at least 90% identical to apolynucleotide sequence selected from the group consisting of SEQ IDNO:22-42, c) a polynucleotide complementary to the polynucleotide of a),d) a polynucleotide complementary to the polynucleotide of b), and e) anRNA equivalent of a)-d). The method comprises a) amplifying said targetpolynucleotide or fragment thereof using polymerase chain reactionamplification, and b) detecting the presence or absence of saidamplified target polynucleotide or fragment thereof, and, optionally, ifpresent, the amount thereof.

[0040] The invention further provides a composition comprising aneffective amount of a polypeptide selected from the group consisting ofa) a polypeptide comprising an amino acid sequence selected from thegroup consisting of SEQ ID NO:1-21, b) a polypeptide comprising anaturally occurring amino acid sequence at least 90% identical to anamino acid sequence selected from the group consisting of SEQ IDNO:1-21, c) a biologically active fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21, and d) an immunogenic fragment of a polypeptide having an aminoacid sequence selected from the group consisting of SEQ ID NO:1-21, anda pharmaceutically acceptable excipient. In one embodiment, thecomposition comprises an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21. The invention additionally provides amethod of treating a disease or condition associated with decreasedexpression of functional PRTS, comprising administering to a patient inneed of such treatment the composition.

[0041] The invention also provides a method for screening a compound foreffectiveness as an agonist of a polypeptide selected from the groupconsisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-21, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-21, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21. The method comprises a) exposing a sample comprising thepolypeptide to a compound, and b) detecting agonist activity in thesample. In one alternative, the invention provides a compositioncomprising an agonist compound identified by the method and apharmaceutically acceptable excipient. In another alternative, theinvention provides a method of treating a disease or conditionassociated with decreased expression of functional PRTS, comprisingadministering to a patient in need of such treatment the composition.

[0042] Additionally, the invention provides a method for screening acompound for effectiveness as an antagonist of a polypeptide selectedfrom the group consisting of a) a polypeptide comprising an amino acidsequence selected from the group consisting of SEQ ID NO:1-21, b) apolypeptide comprising a naturally occurring amino acid sequence atleast 90% identical to an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, c) a biologically active fragment of apolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, and d) an immunogenic fragment of apolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21. The method comprises a) exposing a samplecomprising the polypeptide to a compound, and b) detecting antagonistactivity in the sample. In one alternative, the invention provides acomposition comprising an antagonist compound identified by the methodand a pharmaceutically acceptable excipient. In another alternative, theinvention provides a method of treating a disease or conditionassociated with overexpression of functional PRTS, comprisingadministering to a patient in need of such treatment the composition.

[0043] The invention further provides a method of screening for acompound that specifically binds to a polypeptide selected from thegroup consisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-21, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-21, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21. The method comprises a) combining the polypeptide with at leastone test compound under suitable conditions, and b) detecting binding ofthe polypeptide to the test compound, thereby identifying a compoundthat specifically binds to the polypeptide.

[0044] The invention further provides a method of screening for acompound that modulates the activity of a polypeptide selected from thegroup consisting of a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-21, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-21, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21. The method comprises a) combining the polypeptide with at leastone test compound under conditions permissive for the activity of thepolypeptide, b) assessing the activity of the polypeptide in thepresence of the test compound, and c) comparing the activity of thepolypeptide in the presence of the test compound with the activity ofthe polypeptide in the absence of the test compound, wherein a change inthe activity of the polypeptide in the presence of the test compound isindicative of a compound that modulates the activity of the polypeptide.

[0045] The invention further provides a method for screening a compoundfor effectiveness in altering expression of a target polynucleotide,wherein said target polynucleotide comprises a sequence selected fromthe group consisting of SEQ ID NO:22-42, the method comprising a)exposing a sample comprising the target polynucleotide to a compound,and b) detecting altered expression of the target polynucleotide.

[0046] The invention further provides a method for assessing toxicity ofa test compound, said method comprising a) treating a biological samplecontaining nucleic acids with the test compound; b) hybridizing thenucleic acids of the treated biological sample with a probe comprisingat least 20 contiguous nucleotides of a polynucleotide selected from thegroup consisting of i) a polynucleotide comprising a polynucleotidesequence selected from the group consisting of SEQ ID NO:22-42, ii) apolynucleotide comprising a naturally occurring polynucleotide sequenceat least 90% identical to a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:22-42, iii) a polynucleotide having asequence complementary to i), iv) a polynucleotide complementary to thepolynucleotide of ii), and v) an RNA equivalent of i)-iv). Hybridizationoccurs under conditions whereby a specific hybridization complex isformed between said probe and a target polynucleotide in the biologicalsample, said target polynucleotide selected from the group consisting ofi) a polynucleotide comprising a polynucleotide sequence selected fromthe group consisting of SEQ ID NO:22-42, ii) a polynucleotide comprisinga naturally occurring polynucleotide sequence at least 90% identical toa polynucleotide sequence selected from the group consisting of SEQ IDNO:22-42, iii) a polynucleotide complementary to the polynucleotide ofi), iv) a polynucleotide complementary to the polynucleotide of ii), andv) an RNA equivalent of i)-iv). Alternatively, the target polynucleotidecomprises a fragment of a polynucleotide sequence selected from thegroup consisting of i)-v) above; c) quantifying the amount ofhybridization complex; and d) comparing the amount of hybridizationcomplex in the treated biological sample with the amount ofhybridization complex in an untreated biological sample, wherein adifference in the amount of hybridization complex in the treatedbiological sample is indicative of toxicity of the test compound.

BRIEF DESCRIPTION OF THE TABLES

[0047] Table 1 summarizes the nomenclature for the full lengthpolynucleotide and polypeptide sequences of the present invention.

[0048] Table 2 shows the GenBank identification number and annotation ofthe nearest GenBank homolog for polypeptides of the invention. Theprobability score for the match between each polypeptide and its GenBankhomolog is also shown.

[0049] Table 3 shows structural features of polypeptide sequences of theinvention, including predicted motifs and domains, along with themethods, algorithms, and searchable databases used for analysis of thepolypeptides.

[0050] Table 4 lists the cDNA and/or genomic DNA fragments which wereused to assemble polynucleotide sequences of the invention, along withselected fragments of the polynucleotide sequences.

[0051] Table 5 shows the representative cDNA library for polynucleotidesof the invention.

[0052] Table 6 provides an appendix which describes the tissues andvectors used for construction of the cDNA libraries shown in Table 5.

[0053] Table 7 shows the tools, programs, and algorithms used to analyzethe polynucleotides and polypeptides of the invention, along withapplicable descriptions, references, and threshold parameters.

DESCRIPTION OF THE INVENTION

[0054] Before the present proteins, nucleotide sequences, and methodsare described, it is understood that this invention is not limited tothe particular machines, materials and methods described, as these mayvary. It is also to be understood that the terminology used herein isfor the purpose of describing particular embodiments only, and is notintended to limit the scope of the present invention which will belimited only by the appended claims.

[0055] It must be noted that as used herein and in the appended claims,the singular forms “a,” “an,” and “the” include plural reference unlessthe context clearly dictates otherwise. Thus, for example, a referenceto “a host cell” includes a plurality of such host cells, and areference to “an antibody” is a reference to one or more antibodies andequivalents thereof known to those skilled in the art, and so forth.

[0056] Unless defined otherwise, all technical and scientific terms usedherein have the same meanings as commonly understood by one of ordinaryskill in the art to which this invention belongs. Although any machines,materials, and methods similar or equivalent to those described hereincan be used to practice or test the present invention, the preferredmachines, materials and methods are now described. All publicationsmentioned herein are cited for the purpose of describing and disclosingthe cell lines, protocols, reagents and vectors which are reported inthe publications and which might be used in connection with theinvention. Nothing herein is to be construed as an admission that theinvention is not entitled to antedate such disclosure by virtue of priorinvention.

[0057] Definitions

[0058] “PRTS” refers to the amino acid sequences of substantiallypurified PRTS obtained from any species, particularly a mammalianspecies, including bovine, ovine, porcine, murine, equine, and human,and from any source, whether natural, synthetic, semi-synthetic, orrecombinant.

[0059] The term “agonist” refers to a molecule which intensifies ormimics the biological activity of PRTS. Agonists may include proteins,nucleic acids, carbohydrates, small molecules, or any other compound orcomposition which modulates the activity of PRTS either by directlyinteracting with PRTS or by acting on components of the biologicalpathway in which PRTS participates.

[0060] An “allelic variant” is an alternative form of the gene encodingPRTS. Allelic variants may result from at least one mutation in thenucleic acid sequence and may result in altered mRNAs or in polypeptideswhose structure or function may or may not be altered. A gene may havenone, one, or many allelic variants of its naturally occurring form.Common mutational changes which give rise to allelic variants aregenerally ascribed to natural deletions, additions, or substitutions ofnucleotides. Each of these types of changes may occur alone, or incombination with the others, one or more times in a given sequence.

[0061] “Altered” nucleic acid sequences encoding PRTS include thosesequences with deletions, insertions, or substitutions of differentnucleotides, resulting in a polypeptide the same as PRTS or apolypeptide with at least one functional characteristic of PRTS.Included within this definition are polymorphisms which may or may notbe readily detectable using a particular oligonucleotide probe of thepolynucleotide encoding PRTS, and improper or unexpected hybridizationto allelic variants, with a locus other than the normal chromosomallocus for the polynucleotide sequence encoding PRTS. The encoded proteinmay also be “altered,” and may contain deletions, insertions, orsubstitutions of amino acid residues which produce a silent change andresult in a functionally equivalent PRTS. Deliberate amino acidsubstitutions may be made on the basis of similarity in polarity,charge, solubility, hydrophobicity, hydrophilicity, and/or theamphipathic nature of the residues, as long as the biological orimmunological activity of PRTS is retained. For example, negativelycharged amino acids may include aspartic acid and glutamic acid, andpositively charged amino acids may include lysine and arginine. Aminoacids with uncharged polar side chains having similar hydrophilicityvalues may include: asparagine and glutamine; and serine and threonine.Amino acids with uncharged side chains having similar hydrophilicityvalues may include: leucine, isoleucine, and valine; glycine andalanine; and phenylalanine and tyrosine.

[0062] The terms “amino acid” and “amino acid sequence” refer to anoligopeptide, peptide, polypeptide, or protein sequence, or a fragmentof any of these, and to naturally occurring or synthetic molecules.Where “amino acid sequence” is recited to refer to a sequence of anaturally occurring protein molecule, “amino acid sequence” and liketerms are not meant to limit the amino acid sequence to the completenative amino acid sequence associated with the recited protein molecule.

[0063] “Amplification” relates to the production of additional copies ofa nucleic acid sequence. Amplification is generally carried out usingpolymerase chain reaction (PCR) technologies well known in the art.

[0064] The term “antagonist” refers to a molecule which inhibits orattenuates the biological activity of PRTS. Antagonists may includeproteins such as antibodies, nucleic acids, carbohydrates, smallmolecules, or any other compound or composition which modulates theactivity of PRTS either by directly interacting with PRTS or by actingon components of the biological pathway in which PRTS participates.

[0065] The term “antibody” refers to intact immunoglobulin molecules aswell as to fragments thereof, such as Fab, F(ab′)₂, and Fv fragments,which are capable of binding an epitopic determinant. Antibodies thatbind PRTS polypeptides can be prepared using intact polypeptides orusing fragments containing small peptides of interest as the immunizingantigen. The polypeptide or oligopeptide used to immunize an animal(e.g., a mouse, a rat, or a rabbit) can be derived from the translationof RNA, or synthesized chemically, and can be conjugated to a carrierprotein if desired. Commonly used carriers that are chemically coupledto peptides include bovine serum albumin, thyroglobulin, and keyholelimpet hemocyanin (KLH). The coupled peptide is then used to immunizethe animal.

[0066] The term “antigenic determinant” refers to that region of amolecule (i.e., an epitope) that makes contact with a particularantibody. When a protein or a fragment of a protein is used to immunizea host animal, numerous regions of the protein may induce the productionof antibodies which bind specifically to antigenic determinants(particular regions or three-dimensional structures on the protein). Anantigenic determinant may compete with the intact antigen (i.e., theimmunogen used to elicit the immune response) for binding to anantibody.

[0067] The term “antisense” refers to any composition capable ofbase-pairing with the “sense” (coding) strand of a specific nucleic acidsequence. Antisense compositions may include DNA; RNA; peptide nucleicacid (PNA); oligonucleotides having modified backbone linkages such asphosphorothioates, methylphosphonates, or benzylphosphonates;oligonucleotides having modified sugar groups such as 2′-methoxyethylsugars or 2′-methoxyethoxy sugars; or oligonucleotides having modifiedbases such as 5-methyl cytosine, 2′-deoxyuracil, or7-deaza-2′-deoxyguanosine. Antisense molecules may be produced by anymethod including chemical synthesis or transcription. Once introducedinto a cell, the complementary antisense molecule base-pairs with anaturally occurring nucleic acid sequence produced by the cell to formduplexes which block either transcription or translation. Thedesignation “negative” or “minus” can refer to the antisense strand, andthe designation “positive” or “plus” can refer to the sense strand of areference DNA molecule.

[0068] The term “biologically active” refers to a protein havingstructural, regulatory, or biochemical functions of a naturallyoccurring molecule. Likewise, “immunologically active” or “immunogenic”refers to the capability of the natural, recombinant, or synthetic PRTS,or of any oligopeptide thereof, to induce a specific immune response inappropriate animals or cells and to bind with specific antibodies.

[0069] “Complementary” describes the relationship between twosingle-stranded nucleic acid sequences that anneal by base-pairing. Forexample, 5′-AGT-3′ pairs with its complement, 3′-TCA-5′.

[0070] A “composition comprising a given polynucleotide sequence” and a“composition comprising a given amino acid sequence” refer broadly toany composition containing the given polynucleotide or amino acidsequence. The composition may comprise a dry formulation or an aqueoussolution. Compositions comprising polynucleotide sequences encoding PRTSor fragments of PRTS may be employed as hybridization probes. The probesmay be stored in freeze-dried form and may be associated with astabilizing agent such as a carbohydrate. In hybridizations, the probemay be deployed in an aqueous solution containing salts (e.g., NaCl),detergents (e.g., sodium dodecyl sulfate; SDS), and other components(e.g., Denhardt's solution, dry milk, salmon sperm DNA, etc.).

[0071] “Consensus sequence” refers to a nucleic acid sequence which hasbeen subjected to repeated DNA sequence analysis to resolve uncalledbases, extended using the XL-PCR kit (Applied Biosystems, Foster CityCalif.) in the 5′ and/or the 3′ direction, and resequenced, or which hasbeen assembled from one or more overlapping cDNA, EST, or genomic DNAfragments using a computer program for fragment assembly, such as theGELVIEW fragment assembly system (GCG, Madison Wis.) or Phrap(University of Washington, Seattle Wash.). Some sequences have been bothextended and assembled to produce the consensus sequence.

[0072] “Conservative amino acid substitutions” are those substitutionsthat are predicted to least interfere with the properties of theoriginal protein, i.e., the structure and especially the function of theprotein is conserved and not significantly changed by suchsubstitutions. The table below shows amino acids which may besubstituted for an original amino acid in a protein and which areregarded as conservative amino acid substitutions. Original ConservativeResidue Substitution Ala Gly, Ser Arg His, Lys Asn Asp, Gln, His AspAsn, Glu Cys Ala, Ser Gln Asn, Glu, His Glu Asp, Gln, His Gly Ala HisAsn, Arg, Gln, Glu Ile Leu, Val Leu Ile, Val Lys Arg, Gln, Glu Met Leu,Ile Phe His, Met, Leu, Trp, Tyr Ser Cys, Thr Thr Ser, Val Trp Phe, TyrTyr His, Phe, Trp Val Ile, Leu, Thr

[0073] Conservative amino acid substitutions generally maintain (a) thestructure of the polypeptide backbone in the area of the substitution,for example, as a beta sheet or alpha helical conformation, (b) thecharge or hydrophobicity of the molecule at the site of thesubstitution, and/or (c) the bulk of the side chain.

[0074] A “deletion” refers to a change in the amino acid or nucleotidesequence that results in the absence of one or more amino acid residuesor nucleotides.

[0075] The term “derivative” refers to a chemically modifiedpolynucleotide or polypeptide. Chemical modifications of apolynucleotide can include, for example, replacement of hydrogen by analkyl, acyl, hydroxyl, or amino group. A derivative polynucleotideencodes a polypeptide which retains at least one biological orimmunological function of the natural molecule. A derivative polypeptideis one modified by glycosylation, pegylation, or any similar processthat retains at least one biological or immunological function of thepolypeptide from which it was derived.

[0076] A “detectable label” refers to a reporter molecule or enzyme thatis capable of generating a measurable signal and is covalently ornoncovalently joined to a polynucleotide or polypeptide.

[0077] “Differential expression” refers to increased or upregulated; ordecreased, downregulated, or absent gene or protein expression,determined by comparing at least two different samples. Such comparisonsmay be carried out between, for example, a treated and an untreatedsample, or a diseased and a normal sample.

[0078] “Exon shuffling” refers to the recombination of different codingregions (exons). Since an exon may represent a structural or functionaldomain of the encoded protein, new proteins may be assembled through thenovel reassortment of stable substructures, thus allowing accelerationof the evolution of new protein functions.

[0079] A “fragment” is a unique portion of PRTS or the polynucleotideencoding PRTS which is identical in sequence to but shorter in lengththan the parent sequence. A fragment may comprise up to the entirelength of the defined sequence, minus one nucleotide/amino acid residue.For example, a fragment may comprise from 5 to 1000 contiguousnucleotides or amino acid residues. A fragment used as a probe, primer,antigen, therapeutic molecule, or for other purposes, may be at least 5,10, 15, 16, 20, 25, 30, 40, 50, 60, 75, 100, 150, 250 or at least 500contiguous nucleotides or amino acid residues in length. Fragments maybe preferentially selected from certain regions of a molecule. Forexample, a polypeptide fragment may comprise a certain length ofcontiguous amino acids selected from the first 250 or 500 amino acids(or first 25% or 50%) of a polypeptide as shown in a certain definedsequence. Clearly these lengths are exemplary, and any length that issupported by the specification, including the Sequence Listing, tables,and figures, may be encompassed by the present embodiments.

[0080] A fragment of SEQ ID NO:22-42 comprises a region of uniquepolynucleotide sequence that specifically identifies SEQ ID NO:22-42,for example, as distinct from any other sequence in the genome fromwhich the fragment was obtained. A fragment of SEQ ID NO:22-42 isuseful, for example, in hybridization and amplification technologies andin analogous methods that distinguish SEQ ID NO:22-42 from relatedpolynucleotide sequences. The precise length of a fragment of SEQ IDNO:22-42 and the region of SEQ ID NO:22-42 to which the fragmentcorresponds are routinely determinable by one of ordinary skill in theart based on the intended purpose for the fragment.

[0081] A fragment of SEQ ID NO:1-21 is encoded by a fragment of SEQ IDNO:22-42. A fragment of SEQ ID NO:1-21 comprises a region of uniqueamino acid sequence that specifically identifies SEQ ID NO:1-21. Forexample, a fragment of SEQ ID NO:1-21 is useful as an immunogenicpeptide for the development of antibodies that specifically recognizeSEQ ID NO:1-21. The precise length of a fragment of SEQ ID NO:1-21 andthe region of SEQ ID NO:1-21 to which the fragment corresponds areroutinely determinable by one of ordinary skill in the art based on theintended purpose for the fragment.

[0082] A “full length” polynucleotide sequence is one containing atleast a translation initiation codon (e.g., methionine) followed by anopen reading frame and a translation termination codon. A “full length”polynucleotide sequence encodes a “full length” polypeptide sequence.

[0083] “Homology” refers to sequence similarity or, interchangeably,sequence identity, between two or more polynucleotide sequences or twoor more polypeptide sequences.

[0084] The terms “percent identity” and “% identity,” as applied topolynucleotide sequences, refer to the percentage of residue matchesbetween at least two polynucleotide sequences aligned using astandardized algorithm. Such an algorithm may insert, in a standardizedand reproducible way, gaps in the sequences being compared in order tooptimize alignment between two sequences, and therefore achieve a moremeaningful comparison of the two sequences.

[0085] Percent identity between polynucleotide sequences may bedetermined using the default parameters of the CLUSTAL V algorithm asincorporated into the MEGALIGN version 3.12e sequence alignment program.This program is part of the LASERGENE software package, a suite ofmolecular biological analysis programs (DNASTAR, Madison Wis.). CLUSTALV is described in Higgins, D. G. and P. M. Sharp (1989) CABIOS 5:151-153and in Higgins, D. G. et al. (1992) CABIOS 8:189-191. For pairwisealignments of polynucleotide sequences, the default parameters are setas follows: Ktuple=2, gap penalty=5, window=4, and “diagonals saved”=4.The “weighted” residue weight table is selected as the default. Percentidentity is reported by CLUSTAL V as the “percent similarity” betweenaligned polynucleotide sequences.

[0086] Alternatively, a suite of commonly used and freely availablesequence comparison algorithms is provided by the National Center forBiotechnology Information (NCBI) Basic Local Alignment Search Tool(BLAST) (Altschul, S. F. et al. (1990) J. Mol. Biol. 215:403-410), whichis available from several sources, including the NCBI, Bethesda, Md.,and on the Internet at http://www.ncbi.nlm.nih.gov/BLAST/. The BLASTsoftware suite includes various sequence analysis programs including“blastn,” that is used to align a known polynucleotide sequence withother polynucleotide sequences from a variety of databases. Alsoavailable is a tool called “BLAST 2 Sequences” that is used for directpairwise comparison of two nucleotide sequences. “BLAST 2 Sequences” canbe accessed and used interactively athttp://www.ncbi.nlm.nih.gov/gorf/bl2.html. The “BLAST 2 Sequences” toolcan be used for both blastn and blastp (discussed below). BLAST programsare commonly used with gap and other parameters set to default settings.For example, to compare two nucleotide sequences, one may use blastnwith the “BLAST 2 Sequences” tool Version 2.0.12 (April-21-2000) set atdefault parameters. Such default parameters may be, for example:

[0087] Matrix: BLOSUM62

[0088] Rewardfor match: 1

[0089] Penalty for mismatch: −2

[0090] Open Gap: 5 and Extension Gap: 2 penalties

[0091] Gap x drop-off: 50

[0092] Expect: 10

[0093] Word Size: 11

[0094] Filter: on

[0095] Percent identity may be measured over the length of an entiredefined sequence, for example, as defined by a particular SEQ ID number,or may be measured over a shorter length, for example, over the lengthof a fragment taken from a larger, defined sequence, for instance, afragment of at least 20, at least 30, at least 40, at least 50, at least70, at least 100, or at least 200 contiguous nucleotides. Such lengthsare exemplary only, and it is understood that any fragment lengthsupported by the sequences shown herein, in the tables, figures, orSequence Listing, may be used to describe a length over which percentageidentity may be measured.

[0096] Nucleic acid sequences that do not show a high degree of identitymay nevertheless encode similar amino acid sequences due to thedegeneracy of the genetic code. It is understood that changes in anucleic acid sequence can be made using this degeneracy to producemultiple nucleic acid sequences that all encode substantially the sameprotein.

[0097] The phrases “percent identity” and “% identity,” as applied topolypeptide sequences, refer to the percentage of residue matchesbetween at least two polypeptide sequences aligned using a standardizedalgorithm. Methods of polypeptide sequence alignment are well-known.Some alignment methods take into account conservative amino acidsubstitutions. Such conservative substitutions, explained in more detailabove, generally preserve the charge andhydrophobicity at the site ofsubstitution, thus preserving the structure (and therefore function) ofthe polypeptide.

[0098] Percent identity between polypeptide sequences may be determinedusing the default parameters of the CLUSTAL V algorithm as incorporatedinto the MEGALIGN version 3.12e sequence alignment program (describedand referenced above). For pairwise alignments of polypeptide sequencesusing CLUSTAL V, the default parameters are set as follows: Ktuple=1,gap penalty=3, window=5, and “diagonals saved”=5. The PAM250 matrix isselected as the default residue weight table. As with polynucleotidealignments, the percent identity is reported by CLUSTAL V as the“percent similarity” between aligned polypeptide sequence pairs.

[0099] Alternatively the NCBI BLAST software suite may be used. Forexample, for a pairwise comparison of two polypeptide sequences, one mayuse the “BLAST 2 Sequences” tool Version 2.0.12 (April-21-2000) withblastp set at default parameters. Such default parameters may be, forexample:

[0100] Matrix: BLOSUM62

[0101] Open Gap: 11 and Extension Gap: 1 penalties,

[0102] Gap x drop-off: 50

[0103] Expect: 10

[0104] Word Size: 3

[0105] Filter: on

[0106] Percent identity may be measured over the length of an entiredefined polypeptide sequence, for example, as defined by a particularSEQ ID number, or may be measured over a shorter length, for example,over the length of a fragment taken from a larger, defined polypeptidesequence, for instance, a fragment of at least 15, at least 20, at least30, at least 40, at least 50, at least 70 or at least 150 contiguousresidues. Such lengths are exemplary only, and it is understood that anyfragment length supported by the sequences shown herein, in the tables,figures or Sequence Listing, may be used to describe a length over whichpercentage identity may be measured.

[0107] “Human artificial chromosomes” (HACs) are linear microchromosomeswhich may contain DNA sequences of about 6 kb to 10 Mb in size and whichcontain all of the elements required for chromosome replication,segregation and maintenance.

[0108] The term “humanized antibody” refers to an antibody molecule inwhich the amino acid sequence in the non-antigen binding regions hasbeen altered so that the antibody more closely resembles a humanantibody, and still retains its original binding ability.

[0109] “Hybridization” refers to the process by which a polynucleotidestrand anneals with a complementary strand through base pairing underdefined hybridization conditions. Specific hybridization is anindication that two nucleic acid sequences share a high degree ofcomplementarity. Specific hybridization complexes form under permissiveannealing conditions and remain hybridized after the “washing” step(s).The washing step(s) is particularly important in determining thestringency of the hybridization process, with more stringent conditionsallowing less non-specific binding, i.e., binding between pairs ofnucleic acid strands that are not perfectly matched. Permissiveconditions for annealing of nucleic acid sequences are routinelydeterminable by one of ordinary skill in the art and may be consistentamong hybridization experiments, whereas wash conditions may be variedamong experiments to achieve the desired stringency, and thereforehybridization specificity. Permissive annealing conditions occur, forexample, at 68° C. in the presence of about 6×SSC, about 1% (w/v) SDS,and about 100 μg/ml sheared, denatured salmon sperm DNA.

[0110] Generally, stringency of hybridization is expressed, in part,with reference to the temperature under which the wash step is carriedout. Such wash temperatures are typically selected to be about 5° C. to20° C. lower than the thermal melting point (T_(m)) for the specificsequence at a defined ionic strength and pH. The T_(m) is thetemperature (under defined ionic strength and pH) at which 50% of thetarget sequence hybridizes to a perfectly matched probe. An equation forcalculating T_(m) and conditions for nucleic acid hybridization are wellknown and can be found in Sambrook, J. et al. (1989) Molecular Cloning:A Laboratory Manual, 2^(nd) ed., vol. 1-3, Cold Spring Harbor Press,Plainview N.Y.; specifically see volume 2, chapter 9.

[0111] High stringency conditions for hybridization betweenpolynucleotides of the present invention include wash conditions of 68°C. in the presence of about 0.2×SSC and about 0.1% SDS, for 1 hour.Alternatively, temperatures of about 65° C., 60° C., 55° C., or 42° C.may be used. SSC concentration may be varied from about 0.1 to 2×SSC,with SDS being present at about 0.1%. Typically, blocking reagents areused to block non-specific hybridization. Such blocking reagentsinclude, for instance, sheared and denatured salmon sperm DNA at about100-200 μg/ml. Organic solvent, such as formamide at a concentration ofabout 35-50% v/v, may also be used under particular circumstances, suchas for RNA:DNA hybridizations. Useful variations on these washconditions will be readily apparent to those of ordinary skill in theart. Hybridization, particularly under high stringency conditions, maybe suggestive of evolutionary similarity between the nucleotides. Suchsimilarity is strongly indicative of a similar role for the nucleotidesand their encoded polypeptides.

[0112] The term “hybridization complex” refers to a complex formedbetween two nucleic acid sequences by virtue of the formation ofhydrogen bonds between complementary bases. A hybridization complex maybe formed in solution (e.g., C₀t or R₀t analysis) or formed between onenucleic acid sequence present in solution and another nucleic acidsequence immobilized on a solid support (e.g., paper, membranes,filters, chips, pins or glass slides, or any other appropriate substrateto which cells or their nucleic acids have been fixed).

[0113] The words “insertion” and “addition” refer to changes in an aminoacid or nucleotide sequence resulting in the addition of one or moreamino acid residues or nucleotides, respectively.

[0114] “Immune response” can refer to conditions associated withinflammation, trauma, immune disorders, or infectious or geneticdisease, etc. These conditions can be characterized by expression ofvarious factors, e.g., cytokines, chemokines, and other signalingmolecules, which may affect cellular and systemic defense systems.

[0115] An “immunogenic fragment” is a polypeptide or oligopeptidefragment of PRTS which is capable of eliciting an immune response whenintroduced into a living organism, for example, a mammal. The term“immunogenic fragment” also includes any polypeptide or oligopeptidefragment of PRTS which is useful in any of the antibody productionmethods disclosed herein or known in the art.

[0116] The term “microarray” refers to an arrangement of a plurality ofpolynucleotides, polypeptides, or other chemical compounds on asubstrate.

[0117] The terms “element” and “array element” refer to apolynucleotide, polypeptide, or other chemical compound having a uniqueand defined position on a microarray.

[0118] The term “modulate” refers to a change in the activity of PRTS.For example, modulation may cause an increase or a decrease in proteinactivity, binding characteristics, or any other biological, functional,or immunological properties of PRTS.

[0119] The phrases “nucleic acid” and “nucleic acid sequence” refer to anucleotide, oligonucleotide, polynucleotide, or any fragment thereof.These phrases also refer to DNA or RNA of genomic or synthetic originwhich may be single-stranded or double-stranded and may represent thesense or the antisense strand, to peptide nucleic acid (PNA), or to anyDNA-like or RNA-like material.

[0120] “Operably linked” refers to the situation in which a firstnucleic acid sequence is placed in a functional relationship with asecond nucleic acid sequence. For instance, a promoter is operablylinked to a coding sequence if the promoter affects the transcription orexpression of the coding sequence. Operably linked DNA sequences may bein close proximity or contiguous and, where necessary to join twoprotein coding regions, in the same reading frame.

[0121] “Peptide nucleic acid” (PNA) refers to an antisense molecule oranti-gene agent which comprises an oligonucleotide of at least about 5nucleotides in length linked to a peptide backbone of amino acidresidues ending in lysine. The terminal lysine confers solubility to thecomposition. PNAs preferentially bind complementary single stranded DNAor RNA and stop transcript elongation, and may be pegylated to extendtheir lifespan in the cell.

[0122] “Post-translational modification” of an PRTS may involvelipidation, glycosylation, phosphorylation, acetylation, racemization,proteolytic cleavage, and other modifications known in the art. Theseprocesses may occur synthetically or biochemically. Biochemicalmodifications will vary by cell type depending on the enzymatic milieuof PRTS.

[0123] “Probe” refers to nucleic acid sequences encoding PRTS, theircomplements, or fragments thereof, which are used to detect identical,allelic or related nucleic acid sequences. Probes are isolatedoligonucleotides or polynucleotides attached to a detectable label orreporter molecule. Typical labels include radioactive isotopes, ligands,chemiluminescent agents, and enzymes. “Primers” are short nucleic acids,usually DNA oligonucleotides, which may be annealed to a targetpolynucleotide by complementary base-pairing. The primer may then beextended along the target DNA strand by a DNA polymerase enzyme. Primerpairs can be used for amplification (and identification) of a nucleicacid sequence, e.g., by the polymerase chain reaction (PCR).

[0124] Probes and primers as used in the present invention typicallycomprise at least 15 contiguous nucleotides of a known sequence. Inorder to enhance specificity, longer probes and primers may also beemployed, such as probes and primers that comprise at least 20, 25, 30,40, 50, 60, 70, 80, 90, 100, or at least 150 consecutive nucleotides ofthe disclosed nucleic acid sequences. Probes and primers may beconsiderably longer than these examples, and it is understood that anylength supported by the specification, including the tables, figures,and Sequence Listing, may be used.

[0125] Methods for preparing and using probes and primers are describedin the references, for example Sambrook, J. et al. (1989) MolecularCloning: A Laboratory Manual, 2^(nd) ed., vol. 1-3, Cold Spring HarborPress, Plainview N.Y.; Ausubel, F. M. et al. (1987) Current Protocols inMolecular Biology, Greene Publ. Assoc. & Wiley-Intersciences, New YorkN.Y.; Innis, M. et al. (1990) PCR Protocols, A Guide to Methods andApplications, Academic Press, San Diego Calif. PCR primer pairs can bederived from a known sequence, for example, by using computer programsintended for that purpose such as Primer (Version 0.5, 1991, WhiteheadInstitute for Biomedical Research, Cambridge Mass.).

[0126] Oligonucleotides for use as primers are selected using softwareknown in the art for such purpose. For example, OLIGO 4.06 software isuseful for the selection of PCR primer pairs of up to 100 nucleotideseach, and for the analysis of oligonucleotides and largerpolynucleotides of up to 5,000 nucleotides from an input polynucleotidesequence of up to 32 kilobases. Similar primer selection programs haveincorporated additional features for expanded capabilities. For example,the PrimOU primer selection program (available to the public from theGenome Center at University of Texas South West Medical Center, DallasTex.) is capable of choosing specific primers from megabase sequencesand is thus useful for designing primers on a genome-wide scope. ThePrimer3 primer selection program (available to the public from theWhitehead Institute/MIT Center for Genome Research, Cambridge Mass.)allows the user to input a “mispriming library,” in which sequences toavoid as primer binding sites are user-specified. Primer3 is useful, inparticular, for the selection of oligonucleotides for microarrays. (Thesource code for the latter two primer selection programs may also beobtained from their respective sources and modified to meet the user'sspecific needs.) The PrimeGen program (available to the public from theUK Human Genome Mapping Project Resource Centre, Cambridge UK) designsprimers based on multiple sequence alignments, thereby allowingselection of primers that hybridize to either the most conserved orleast conserved regions of aligned nucleic acid sequences. Hence, thisprogram is useful for identification of both unique and conservedoligonucleotides and polynucleotide fragments. The oligonucleotides andpolynucleotide fragments identified by any of the above selectionmethods are useful in hybridization technologies, for example, as PCR orsequencing primers, microarray elements, or specific probes to identifyfully or partially complementary polynucleotides in a sample of nucleicacids. Methods of oligonucleotide selection are not limited to thosedescribed above.

[0127] A “recombinant nucleic acid” is a sequence that is not naturallyoccurring or has a sequence that is made by an artificial combination oftwo or more otherwise separated segments of sequence. This artificialcombination is often accomplished by chemical synthesis or, morecommonly, by the artificial manipulation of isolated segments of nucleicacids, e.g., by genetic engineering techniques such as those describedin Sambrook, supra. The term recombinant includes nucleic acids thathave been altered solely by addition, substitution, or deletion of aportion of the nucleic acid. Frequently, a recombinant nucleic acid mayinclude a nucleic acid sequence operably linked to a promoter sequence.Such a recombinant nucleic acid may be part of a vector that is used,for example, to transform a cell.

[0128] Alternatively, such recombinant nucleic acids may be part of aviral vector, e.g., based on a vaccinia virus, that could be use tovaccinate a mammal wherein the recombinant nucleic acid is expressed,inducing a protective immunological response in the mammal.

[0129] A “regulatory element” refers to a nucleic acid sequence usuallyderived from untranslated regions of a gene and includes enhancers,promoters, introns, and 5′ and 3′ untranslated regions (UTRs).Regulatory elements interact with host or viral proteins which controltranscription, translation, or RNA stability.

[0130] “Reporter molecules” are chemical or biochemical moieties usedfor labeling a nucleic acid, amino acid, or antibody. Reporter moleculesinclude radionuclides; enzymes; fluorescent, chemiluminescent, orchromogenic agents; substrates; cofactors; inhibitors; magneticparticles; and other moieties known in the art.

[0131] An “RNA equivalent,” in reference to a DNA sequence, is composedof the same linear sequence of nucleotides as the reference DNA sequencewith the exception that all occurrences of the nitrogenous base thymineare replaced with uracil, and the sugar backbone is composed of riboseinstead of deoxyribose.

[0132] The term “sample” is used in its broadest sense. A samplesuspected of containing PRTS, nucleic acids encoding PRTS, or fragmentsthereof may comprise a bodily fluid; an extract from a cell, chromosome,organelle, or membrane isolated from a cell; a cell; genomic DNA, RNA,or cDNA, in solution or bound to a substrate; a tissue; a tissue print;etc.

[0133] The terms “specific binding” and “specifically binding” refer tothat interaction between a protein or peptide and an agonist, anantibody, an antagonist, a small molecule, or any natural or syntheticbinding composition. The interaction is dependent upon the presence of aparticular structure of the protein, e.g., the antigenic determinant orepitope, recognized by the binding molecule. For example, if an antibodyis specific for epitope “A,” the presence of a polypeptide comprisingthe epitope A, or the presence of free unlabeled A, in a reactioncontaining free labeled A and the antibody will reduce the amount oflabeled A that binds to the antibody.

[0134] The term “substantially purified” refers to nucleic acid or aminoacid sequences that are removed from their natural environment and areisolated or separated, and are at least 60% free, preferably at least75% free, and most preferably at least 90% free from other componentswith which they are naturally associated.

[0135] A “substitution” refers to the replacement of one or more aminoacid residues or nucleotides by different amino acid residues ornucleotides, respectively.

[0136] “Substrate” refers to any suitable rigid or semi-rigid supportincluding membranes, filters, chips, slides, wafers, fibers, magnetic ornonmagnetic beads, gels, tubing, plates, polymers, microparticles andcapillaries. The substrate can have a variety of surface forms, such aswells, trenches, pins, channels and pores, to which polynucleotides orpolypeptides are bound.

[0137] A “transcript image” refers to the collective pattern of geneexpression by a particular cell type or tissue under given conditions ata given time.

[0138] “Transformation” describes a process by which exogenous DNA isintroduced into a recipient cell. Transformation may occur under naturalor artificial conditions according to various methods well known in theart, and may rely on any known method for the insertion of foreignnucleic acid sequences into a prokaryotic or eukaryotic host cell. Themethod for transformation is selected based on the type of host cellbeing transformed and may include, but is not limited to, bacteriophageor viral infection, electroporation, heat shock, lipofection, andparticle bombardment. The term “transformed cells” includes stablytransformed cells in which the inserted DNA is capable of replicationeither as an autonomously replicating plasmid or as part of the hostchromosome, as well as transiently transformed cells which express theinserted DNA or RNA for limited periods of time.

[0139] A “transgenic organism,” as used herein, is any organism,including but not limited to animals and plants, in which one or more ofthe cells of the organism contains heterologous nucleic acid introducedby way of human intervention, such as by transgenic techniques wellknown in the art. The nucleic acid is introduced into the cell, directlyor indirectly by introduction into a precursor of the cell, by way ofdeliberate genetic manipulation, such as by microinjection or byinfection with a recombinant virus. The term genetic manipulation doesnot include classical cross-breeding, or in vitro fertilization, butrather is directed to the introduction of a recombinant DNA molecule.The transgenic organisms contemplated in accordance with the presentinvention include bacteria, cyanobacteria, fungi, plants and animals.The isolated DNA of the present invention can be introduced into thehost by methods known in the art, for example infection, transfection,transformation or transconjugation. Techniques for transferring the DNAof the present invention into such organisms are widely known andprovided in references such as Sambrook et al. (1989), supra.

[0140] A “variant” of a particular nucleic acid sequence is defined as anucleic acid sequence having at least 40% sequence identity to theparticular nucleic acid sequence over a certain length of one of thenucleic acid sequences using blastn with the “BLAST 2 Sequences” toolVersion 2.0.9 (May 7, 1999) set at default parameters. Such a pair ofnucleic acids may show, for example, at least 50%, at least 60%, atleast 70%, at least 80%, at least 85%, at least 90%, at least 91%, atleast 92%, at least 93%, at least 94%, at least 95%, at least 96%, atleast 97%, at least 98%, or at least 99% or greater sequence identityover a certain defined length. A variant may be described as, forexample, an “allelic” (as defined above), “splice,” “species,” or“polymorphic” variant. A splice variant may have significant identity toa reference molecule, but will generally have a greater or lesser numberof polynucleotides due to alternate splicing of exons during mRNAprocessing. The corresponding polypeptide may possess additionalfunctional domains or lack domains that are present in the referencemolecule. Species variants are polynucleotide sequences that vary fromone species to another. The resulting polypeptides will generally havesignificant amino acid identity relative to each other. A polymorphicvariant is a variation in the polynucleotide sequence of a particulargene between individuals of a given species. Polymorphic variants alsomay encompass “single nucleotide polymorphisms” (SNPs) in which thepolynucleotide sequence varies by one nucleotide base. The presence ofSNPs may be indicative of, for example, a certain population, a diseasestate, or a propensity for a disease state.

[0141] A “variant” of a particular polypeptide sequence is defined as apolypeptide sequence having at least 40% sequence identity to theparticular polypeptide sequence over a certain length of one of thepolypeptide sequences using blastp with the “BLAST 2 Sequences” toolVersion 2.0.9 (May 7, 1999) set at default parameters. Such a pair ofpolypeptides may show, for example, at least 50%, at least 60%, at least70%, at least 80%, at least 90%, at least 91%, at least 92%, at least93%, at least 94%, at least 95%, at least 96%, at least 97%, at least98%, or at least 99% or greater sequence identity over a certain definedlength of one of the polypeptides.

[0142] The Invention

[0143] The invention is based on the discovery of new human proteases(PRTS), the polynucleotides encoding PRTS, and the use of thesecompositions for the diagnosis, treatment, or prevention ofgastrointestinal, cardiovascular, autoimmune/inflammatory, cellproliferative, developmental, epithelial, neurological, and reproductivedisorders.

[0144] Table 1 summarizes the nomenclature for the full lengthpolynucleotide and polypeptide sequences of the invention. Eachpolynucleotide and its corresponding polypeptide are correlated to asingle Incyte project identification number (Incyte Project ID). Eachpolypeptide sequence is denoted by both a polypeptide sequenceidentification number (Polypeptide SEQ ID NO:) and an Incyte polypeptidesequence number (Incyte Polypeptide ID) as shown. Each polynucleotidesequence is denoted by both a polynucleotide sequence identificationnumber (Polynucleotide SEQ ID NO:) and an Incyte polynucleotideconsensus sequence number (Incyte Polynucleotide ID) as shown.

[0145] Table 2 shows sequences with homology to the polypeptides of theinvention as identified by BLAST analysis against the GenBank protein(genpept) database. Columns 1 and 2 show the polypeptide sequenceidentification number (Polypeptide SEQ ID NO:) and the correspondingIncyte polypeptide sequence number (Incyte Polypeptide ID) forpolypeptides of the invention. Column 3 shows the GenBank identificationnumber (Genbank ID NO:) of the nearest GenBank homolog. Column 4 showsthe probability score for the match between each polypeptide and itsGenBank homolog. Column 5 shows the annotation of the GenBank homologalong with relevant citations where applicable, all of which areexpressly incorporated by reference herein.

[0146] Table 3 shows various structural features of the polypeptides ofthe invention. Columns 1 and 2 show the polypeptide sequenceidentification number (SEQ ID NO:) and the corresponding Incytepolypeptide sequence number (Incyte Polypeptide ID) for each polypeptideof the invention. Column 3 shows the number of amino acid residues ineach polypeptide. Column 4 shows potential phosphorylation sites, andcolumn 5 shows potential glycosylation sites, as determined by theMOTIFS program of the GCG sequence analysis software package (GeneticsComputer Group, Madison Wis.). Column 6 shows amino acid residuescomprising signature sequences, domains, and motifs. Column 7 showsanalytical methods for protein structure/function analysis and in somecases, searchable databases to which the analytical methods wereapplied.

[0147] Together, Tables 2 and 3 summarize the properties of polypeptidesof the invention, and these properties establish that the claimedpolypeptides are proteases. For example, SEQ ID NO:1 is 85% identical tohuman calpain 3; calcium activated neutral protease (GenBank IDg7684607) as determined by the Basic Local Alignment Search Tool(BLAST). (See Table 2.) The BLAST probability score is 0.0, whichindicates the probability of obtaining the observed polypeptide sequencealignment by chance. SEQ ID NO:1 also contains a calpain family cysteineprotease domain, an EF-hand domain and a calpain large subunit, domainIII as determined by searching for statistically significant matches inthe hidden Markov model (HMM)-based PFAM database of conserved proteinfamily domains. (See Table 3.) Data from BLIMPS and MOTIFS analysesprovide further corroborative evidence that SEQ ID NO:1 is a protease.In an alternative example, SEQ ID NO:5 is 89% identical to humanubiquitin hydrolyzing enzyme I (GenBank ID g3220154) as determined bythe Basic Local Alignment Search Tool (BLAST). (See Table 2.) The BLASTprobability score is 0.0, which indicates the probability of obtainingthe observed polypeptide sequence alignment by chance. SEQ ID NO:5 alsocontains a ubiquitin carboxyl terminal hydrolase active site domain asdetermined by searching for statistically significant matches in thehidden Markov model (HMM)-based PFAM database of conserved proteinfamily domains. (See Table 3.) Data from BLIMPS and MOTIFS analysesprovide further corroborative evidence that SEQ ID NO:5 is a ubiquitinprotease. In another alternative example, SEQ ID NO:15 has 56% localidentity to mouse mast cell metalloprotease-6 (GenBank ID g200507) asdetermined by the Basic Local Alignment Search Tool (BLAST). (See Table2.) The BLAST probability score is 1.7e-60, which indicates theprobability of obtaining the observed polypeptide sequence alignment bychance. SEQ ID NO:15 also contains a trypsin family serine proteaseactive site domain as determined by searching for statisticallysignificant matches in the hidden Markov model (HMM)-based PFAM databaseof conserved protein family domains. (See Table 3.) The presence of thisdomain is confirmed by BLIMPS, MOTIFS, and PROFILESCAN analyses. BLIMPSanalysis also reveals the presence of kringle and type I fibronectindomains, providing further corroborative evidence that SEQ ID NO:15 is aserine protease of the trypsin family. In yet another alternativeexample, SEQ ID NO:17 has 36% local identity to limulus coagulationfactor C precursor (GenBank ID g217397) as determined by the Basic LocalAlignment Search Tool (BLAST). (See Table 2.) The BLAST probabilityscore is 5.1e-53, which indicates the probability of obtaining theobserved polypeptide sequence alignment by chance. SEQ ID NO:17 alsocontains a trypsin family protease active site domain as determined bysearching for statistically significant matches in the hidden Markovmodel (HMM)-based PFAM database of conserved protein family domains.(See Table 3.) This same analysis reveals the presence of CUB andEGF-like domains. Data from BLIMPS, MOTIFS, and PROFILESCAN analysesprovide further corroborative evidence that SEQ ID NO:17 is a serineprotease of the trypsin family. In still another alternative example,SEQ ID NO:18 is 93% identical to human disintegrin and metalloproteasedomain 19 (GenBank ID g6651071) as determined by the Basic LocalAlignment Search Tool (BLAST). (See Table 2.) The BLAST probabilityscore is 0.0, which indicates the probability of obtaining the observedpolypeptide sequence alignment by chance. SEQ ID NO:18 also contains aneutral zinc metalloprotease active site and a disintegrin domain asdetermined by searching for statistically significant matches in thehidden Markov model (HMM)-based PFAM database of conserved proteinfamily domains. (See Table 3.) Data from BLIMPS, MOTIFS, and PROFILESCANanalyses provide further corroborative evidence that SEQ ID NO:18 is ametalloprotease of the ADAM family. In an alternative example, SEQ IDNO:20 has 73% local identity to mouse ubiquitin specific protease(GenBank ID g7673618) as determined by the Basic Local Alignment SearchTool (BLAST). (See Table 2.) The BLAST probability score is 0.0, whichindicates the probability of obtaining the observed polypeptide sequencealignment by chance. SEQ ID NO:20 also contains ubiquitincarboxyl-terminal hydrolase active site domains as determined bysearching for statistically significant matches in the hidden Markovmodel (HMM)-based PFAM database of conserved protein family domains.(See Table 3.) Data from BLIMPS and MOTIFS analyses provide furthercorroborative evidence that SEQ ID NO:20 is a ubiquitin specificprotease. SEQ ID NO:2-4, SEQ ID NO:6-14, SEQ ID NO:16, SEQ ID NO:19 andSEQ ID NO:21 were analyzed and annotated in a similar manner. Thealgorithms and parameters for the analysis of SEQ ID NO:1-21 aredescribed in Table 7.

[0148] As shown in Table 4, the full length polynucleotide sequences ofthe present invention were assembled using cDNA sequences or coding(exon) sequences derived from genomic DNA, or any combination of thesetwo types of sequences. Columns 1 and 2 list the polynucleotide sequenceidentification number (Polynucleotide SEQ ID NO:) and the correspondingIncyte polynucleotide consensus sequence number (Incyte PolynucleotideID) for each polynucleotide of the invention. Column 3 shows the lengthof each polynucleotide sequence in basepairs. Column 4 lists fragmentsof the polynucleotide sequences which are useful, for example, inhybridization or amplification technologies that identify SEQ IDNO:22-42 or that distinguish between SEQ ID NO:22-42 and relatedpolynucleotide sequences. Column 5 shows identification numberscorresponding to cDNA sequences, coding sequences (exons) predicted fromgenomic DNA, and/or sequence assemblages comprised of both cDNA andgenomic DNA. These sequences were used to assemble the full lengthpolynucleotide sequences of the invention. Columns 6 and 7 of Table 4show the nucleotide start (5′) and stop (3′) positions of the cDNAand/or genomic sequences in column 5 relative to their respective fulllength sequences.

[0149] The identification numbers in Column 5 of Table 4 may referspecifically, for example, to Incyte cDNAs along with theircorresponding cDNA libraries. For example, 4847254F8 is theidentification number of an Incyte cDNA sequence, and SPLNTUT02 is thecDNA library from which it is derived. Incyte cDNAs for which cDNAlibraries are not indicated were derived from pooled cDNA libraries(e.g., 71666762V1). Alternatively, the identification numbers in column5 may refer to GenBank cDNAs or ESTs (e.g., g7377067) which contributedto the assembly of the full length polynucleotide sequences. Inaddition, the identification numbers in column 5 may identify sequencesderived from the ENSEMBL (The Sanger Centre, Cambridge, UK) database(i.e., those sequences including the designation “ENST”). Alternatively,the identification numbers in column 5 may be derived from the NCBIRefSeq Nucleotide Sequence Records Database (i.e., those sequencesincluding the designation “NM” or “NT”) or the NCBI RefSeq ProteinSequence Records (i.e., those sequences including the designation “NP”).Alternatively, the identification numbers in column 5 may refer toassemblages of both cDNA and Genscan-predicted exons brought together byan “exon stitching” algorithm. For example, FL_XXXXXX_N₁_N₂_YYYYY_N₃_N₄represents a “stitched” sequence in which XXXXXX is the identificationnumber of the cluster of sequences to which the algorithm was applied,and YYYYY is the number of the prediction generated by the algorithm,and N_(1,2,3), if present, represent specific exons that may have beenmanually edited during analysis (See Example V). Alternatively, theidentification numbers in column may refer to assemblages of exonsbrought together by an “exon-stretching” algorithm. For example,FLXXXXXX_gAAAAA_gBBBBB_(—)1_N is the identification number of a“stretched” sequence, with XXXXXX being the Incyte projectidentification number, gAAAAA being the GenBank identification number ofthe human genomic sequence to which the “exon-stretching” algorithm wasapplied, gBBBBB being the GenBank identification number or NCBI RefSeqidentification number of the nearest GenBank protein homolog, and Nreferring to specific exons (See Example V). In instances where a RefSeqsequence was used as a protein homolog for the “exon-stretching”algorithm, a RefSeq identifier (denoted by “NM,” “NP,” or “NT”) may beused in place of the GenBank identifier (i.e., gBBBBB).

[0150] Alternatively, a prefix identifies component sequences that werehand-edited, predicted from genomic DNA sequences, or derived from acombination of sequence analysis methods. The following Table listsexamples of component sequence prefixes and corresponding sequenceanalysis methods associated with the prefixes (see Example IV andExample V). Prefix Type of analysis and/or examples of programs GNN,GFG, Exon prediction from genomic sequences using, ENST for example,GENSCAN (Stanford University, CA, USA) or FGENES (Computer GenomicsGroup, The Sanger Centre, Cambridge, UK). GBI Hand-edited analysis ofgenomic sequences. FL Stitched or stretched genomic sequences (seeExample V). INCY Full length transcript and exon prediction from mappingof EST sequences to the genome. Genomic location and EST compositiondata are combined to predict the exons and resulting transcript.

[0151] In some cases, Incyte cDNA coverage redundant with the sequencecoverage shown in column 5 was obtained to confirm the final consensuspolynucleotide sequence, but the relevant Incyte cDNA identificationnumbers are not shown.

[0152] Table 5 shows the representative cDNA libraries for those fulllength polynucleotide sequences which were assembled using Incyte cDNAsequences. The representative cDNA library is the Incyte cDNA librarywhich is most frequently represented by the Incyte cDNA sequences whichwere used to assemble and confirm the above polynucleotide sequences.The tissues and vectors which were used to construct the cDNA librariesshown in Table 5 are described in Table 6.

[0153] The invention also encompasses PRTS variants. A preferred PRTSvariant is one which has at least about 80%, or alternatively at leastabout 90%, or even at least about 95% amino acid sequence identity tothe PRTS amino acid sequence, and which contains at least one functionalor structural characteristic of PRTS.

[0154] The invention also encompasses polynucleotides which encode PRTS.In a particular embodiment, the invention encompasses a polynucleotidesequence comprising a sequence selected from the group consisting of SEQID NO:22-42, which encodes PRTS. The polynucleotide sequences of SEQ IDNO:22-42, as presented in the Sequence Listing, embrace the equivalentRNA sequences, wherein occurrences of the nitrogenous base thymine arereplaced with uracil, and the sugar backbone is composed of riboseinstead of deoxyribose.

[0155] The invention also encompasses a variant of a polynucleotidesequence encoding PRTS. In particular, such a variant polynucleotidesequence will have at least about 70%, or alternatively at least about85%, or even at least about 95% polynucleotide sequence identity to thepolynucleotide sequence encoding PRTS. A particular aspect of theinvention encompasses a variant of a polynucleotide sequence comprisinga sequence selected from the group consisting of SEQ ID NO:22-42 whichhas at least about 70%, or alternatively at least about 85%, or even atleast about 95% polynucleotide sequence identity to a nucleic acidsequence selected from the group consisting of SEQ ID NO:22-42. Any oneof the polynucleotide variants described above can encode an amino acidsequence which contains at least one functional or structuralcharacteristic of PRTS.

[0156] It will be appreciated by those skilled in the art that as aresult of the degeneracy of the genetic code, a multitude ofpolynucleotide sequences encoding PRTS, some bearing minimal similarityto the polynucleotide sequences of any known and naturally occurringgene, may be produced. Thus, the invention contemplates each and everypossible variation of polynucleotide sequence that could be made byselecting combinations based on possible codon choices. Thesecombinations are made in accordance with the standard triplet geneticcode as applied to the polynucleotide sequence of naturally occurringPRTS, and all such variations are to be considered as being specificallydisclosed.

[0157] Although nucleotide sequences which encode PRTS and its variantsare generally capable of hybridizing to the nucleotide sequence of thenaturally occurring PRTS under appropriately selected conditions ofstringency, it may be advantageous to produce nucleotide sequencesencoding PRTS or its derivatives possessing a substantially differentcodon usage, e.g., inclusion of non-naturally occurring codons. Codonsmay be selected to increase the rate at which expression of the peptideoccurs in a particular prokaryotic or eukaryotic host in accordance withthe frequency with which particular codons are utilized by the host.Other reasons for substantially altering the nucleotide sequenceencoding PRTS and its derivatives without altering the encoded aminoacid sequences include the production of RNA transcripts having moredesirable properties, such as a greater half-life, than transcriptsproduced from the naturally occurring sequence.

[0158] The invention also encompasses production of DNA sequences whichencode PRTS and PRTS derivatives, or fragments thereof, entirely bysynthetic chemistry. After production, the synthetic sequence may beinserted into any of the many available expression vectors and cellsystems using reagents well known in the art. Moreover, syntheticchemistry may be used to introduce mutations into a sequence encodingPRTS or any fragment thereof.

[0159] Also encompassed by the invention are polynucleotide sequencesthat are capable of hybridizing to the claimed polynucleotide sequences,and, in particular, to those shown in SEQ ID NO:22-42 and fragmentsthereof under various conditions of stringency. (See, e.g., Wahl, G. M.and S. L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A. R.(1987) Methods Enzymol. 152:507-511.) Hybridization conditions,including annealing and wash conditions, are described in “Definitions.”

[0160] Methods for DNA sequencing are well known in the art and may beused to practice any of the embodiments of the invention. The methodsmay employ such enzymes as the Klenow fragment of DNA polymerase I,SEQUENASE (US Biochemical, Cleveland Ohio), Taq polymerase (AppliedBiosystems), thermostable T7 polymerase (Amersham Pharmacia Biotech,Piscataway N.J.), or combinations of polymerases and proofreadingexonucleases such as those found in the ELONGASE amplification system(Life Technologies, Gaithersburg Md.). Preferably, sequence preparationis automated with machines such as the MICROLAB 2200 liquid transfersystem (Hamilton, Reno Nev.), PTC200 thermal cycler (MJ Research,Watertown Mass.) and ABI CATALYST 800 thermal cycler (AppliedBiosystems). Sequencing is then carried out using either the ABI 373 or377 DNA sequencing system (Applied Biosystems), the MEGABACE 1000 DNAsequencing system (Molecular Dynamics, Sunnyvale Calif.), or othersystems known in the art. The resulting sequences are analyzed using avariety of algorithms which are well known in the art. (See, e.g.,Ausubel, F. M. (1997) Short Protocols in Molecular Biology, John Wiley &Sons, New York N.Y., unit 7.7; Meyers, R. A. (1995) Molecular Biologyand Biotechnology, Wiley VCH, New York N.Y., pp. 856-853.)

[0161] The nucleic acid sequences encoding PRTS may be extendedutilizing a partial nucleotide sequence and employing various PCR-basedmethods known in the art to detect upstream sequences, such as promotersand regulatory elements. For example, one method which may be employed,restriction-site PCR, uses universal and nested primers to amplifyunknown sequence from genomic DNA within a cloning vector. (See, e.g.,Sarkar, G. (1993) PCR Methods Applic. 2:318-322.) Another method,inverse PCR, uses primers that extend in divergent directions to amplifyunknown sequence from a circularized template. The template is derivedfrom restriction fragments comprising a known genomic locus andsurrounding sequences. (See, e.g., Triglia, T. et al. (1988) NucleicAcids Res. 16:8186.) A third method, capture PCR, involves PCRamplification of DNA fragments adjacent to known sequences in human andyeast artificial chromosome DNA. (See, e.g., Lagerstrom, M. et al.(1991) PCR Methods Applic. 1:111-119.) In this method, multiplerestriction enzyme digestions and ligations may be used to insert anengineered double-stranded sequence into a region of unknown sequencebefore performing PCR. Other methods which may be used to retrieveunknown sequences are known in the art. (See, e.g., Parker, J. D. et al.(1991) Nucleic Acids Res. 19:3055-3060). Additionally, one may use PCR,nested primers, and PROMOTERFINDER libraries (Clontech, Palo AltoCalif.) to walk genomic DNA. This procedure avoids the need to screenlibraries and is useful in finding intron/exon junctions. For allPCR-based methods, primers may be designed using commercially availablesoftware, such as OLIGO 4.06 primer analysis software (NationalBiosciences, Plymouth Minn.) or another appropriate program, to be about22 to 30 nucleotides in length, to have a GC content of about 50% ormore, and to anneal to the template at temperatures of about 68° C. to72° C.

[0162] When screening for full length cDNAs, it is preferable to uselibraries that have been size-selected to include larger cDNAs. Inaddition, random-primed libraries, which often include sequencescontaining the 5′ regions of genes, are preferable for situations inwhich an oligo d(T) library does not yield a full-length cDNA. Genomiclibraries may be useful for extension of sequence into 5′non-transcribed regulatory regions.

[0163] Capillary electrophoresis systems which are commerciallyavailable may be used to analyze the size or confirm the nucleotidesequence of sequencing or PCR products. In particular, capillarysequencing may employ flowable polymers for electrophoretic separation,four different nucleotide-specific, laser-stimulated fluorescent dyes,and a charge coupled device camera for detection of the emittedwavelengths. Output/light intensity may be converted to electricalsignal using appropriate software (e.g., GENOTYPER and SEQUENCENAVIGATOR, Applied Biosystems), and the entire process from loading ofsamples to computer analysis and electronic data display may be computercontrolled. Capillary electrophoresis is especially preferable forsequencing small DNA fragments which may be present in limited amountsin a particular sample.

[0164] In another embodiment of the invention, polynucleotide sequencesor fragments thereof which encode PRTS may be cloned in recombinant DNAmolecules that direct expression of PRTS, or fragments or functionalequivalents thereof, in appropriate host cells. Due to the inherentdegeneracy of the genetic code, other DNA sequences which encodesubstantially the same or a functionally equivalent amino acid sequencemay be produced and used to express PRTS.

[0165] The nucleotide sequences of the present invention can beengineered using methods generally known in the art in order to alterPRTS-encoding sequences for a variety of purposes including, but notlimited to, modification of the cloning, processing, and/or expressionof the gene product. DNA shuffling by random fragmentation and PCRreassembly of gene fragments and synthetic oligonucleotides may be usedto engineer the nucleotide sequences. For example,oligonucleotide-mediated site-directed mutagenesis may be used tointroduce mutations that create new restriction sites, alterglycosylation patterns, change codon preference, produce splicevariants, and so forth.

[0166] The nucleotides of the present invention may be subjected to DNAshuffling techniques such as MOLECULARBREEDING (Maxygen Inc., SantaClara Calif.; described in U.S. Pat. No. 5,837,458; Chang, C.-C. et al.(1999) Nat. Biotechnol. 17:793-797; Christians, F. C. et al. (1999) Nat.Biotechnol. 17:259-264; and Crameri, A. et al. (1996) Nat. Biotechnol.14:315-319) to alter or improve the biological properties of PRTS, suchas its biological or enzymatic activity or its ability to bind to othermolecules or compounds. DNA shuffling is a process by which a library ofgene variants is produced using PCR-mediated recombination of genefragments. The library is then subjected to selection or screeningprocedures that identify those gene variants with the desiredproperties. These preferred variants may then be pooled and furthersubjected to recursive rounds of DNA shuffling and selection/screening.Thus, genetic diversity is created through “artificial” breeding andrapid molecular evolution. For example, fragments of a single genecontaining random point mutations may be recombined, screened, and thenreshuffled until the desired properties are optimized. Alternatively,fragments of a given gene may be recombined with fragments of homologousgenes in the same gene family, either from the same or differentspecies, thereby maximizing the genetic diversity of multiple naturallyoccurring genes in a directed and controllable manner.

[0167] In another embodiment, sequences encoding PRTS may besynthesized, in whole or in part, using chemical methods well known inthe art. (See, e.g., Caruthers, M. H. et al. (1980) Nucleic Acids Symp.Ser. 7:215-223; and Horn, T. et al. (1980) Nucleic Acids Symp. Ser.7:225-232.) Alternatively, PRTS itself or a fragment thereof may besynthesized using chemical methods. For example, peptide synthesis canbe performed using various solution-phase or solid-phase techniques.(See, e.g., Creighton, T. (1984) Proteins, Structures and MolecularProperties, W H Freeman, New York N.Y., pp. 55-60; and Roberge, J. Y. etal. (1995) Science 269:202-204.) Automated synthesis may be achievedusing the ABI 431A peptide synthesizer (Applied Biosystems).Additionally, the amino acid sequence of PRTS, or any part thereof, maybe altered during direct synthesis and/or combined with sequences fromother proteins, or any part thereof, to produce a variant polypeptide ora polypeptide having a sequence of a naturally occurring polypeptide.

[0168] The peptide may be substantially purified by preparative highperformance liquid 10 chromatography. (See, e.g., Chiez, R. M. and F. Z.Regnier (1990) Methods Enzymol. 182:392-421.) The composition of thesynthetic peptides may be confirmed by amino acid analysis or bysequencing. (See, e.g., Creighton, supra, pp. 28-53.)

[0169] In order to express a biologically active PRTS, the nucleotidesequences encoding PRTS or derivatives thereof may be inserted into anappropriate expression vector, i.e., a vector which contains thenecessary elements for transcriptional and translational control of theinserted coding sequence in a suitable host. These elements includeregulatory sequences, such as enhancers, constitutive and induciblepromoters, and 5′ and 3′ untranslated regions in the vector and inpolynucleotide sequences encoding PRTS. Such elements may vary in theirstrength and specificity. Specific initiation signals may also be usedto achieve more efficient translation of sequences encoding PRTS. Suchsignals include the ATG initiation codon and adjacent sequences, e.g.the Kozak sequence. In cases where sequences encoding PRTS and itsinitiation codon and upstream regulatory sequences are inserted into theappropriate expression vector, no additional transcriptional ortranslational control signals may be needed. However, in cases whereonly coding sequence, or a fragment thereof, is inserted, exogenoustranslational control signals including an in-frame ATG initiation codonshould be provided by the vector. Exogenous translational elements andinitiation codons may be of various origins, both natural and synthetic.The efficiency of expression may be enhanced by the inclusion ofenhancers appropriate for the particular host cell system used. (See,e.g., Scharf, D. et al. (1994) Results Probl. Cell Differ. 20:125-162.)

[0170] Methods which are well known to those skilled in the art may beused to construct expression vectors containing sequences encoding PRTSand appropriate transcriptional and translational control elements.These methods include in vitro recombinant DNA techniques, synthetictechniques, and in vivo genetic recombination. (See, e.g., Sambrook, J.et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring HarborPress, Plainview N.Y., ch. 4, 8, and 16-17; Ausubel, F. M. et al. (1995)Current Protocols in Molecular Biology, John Wiley & Sons, New YorkN.Y., ch. 9, 13, and 16.)

[0171] A variety of expression vector/host systems may be utilized tocontain and express sequences encoding PRTS. These include, but are notlimited to, microorganisms such as bacteria transformed with recombinantbacteriophage, plasmid, or cosmid DNA expression vectors; yeasttransformed with yeast expression vectors; insect cell systems infectedwith viral expression vectors (e.g., baculovirus); plant cell systemstransformed with viral expression vectors (e.g., cauliflower mosaicvirus, CaMV, or tobacco mosaic virus, TMV) or with bacterial expressionvectors (e.g., Ti or pBR322 plasmids); or animal cell systems. (See,e.g., Sambrook, sunra; Ausubel, supra; Van Heeke, G. and S. M. Schuster(1989) J. Biol. Chem. 264:5503-5509; Engelhard, E. K. et al. (1994)Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum.Gene Ther. 7:1937-1945; Takamatsu, N. (1987) EMBO J. 6:307-311; TheMcGraw Hill Yearbook of Science and Technology (1992) McGraw Hill, NewYork N.Y., pp. 191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad.Sci. USA 81:3655-3659; and Harrington, J. J. et al. (1997) Nat. Genet.15:345-355.) Expression vectors derived from retroviruses, adenoviruses,or herpes or vaccinia viruses, or from various bacterial plasmids, maybe used for delivery of nucleotide sequences to the targeted organ,tissue, or cell population. (See, e.g., Di Nicola, M. et al. (1998)Cancer Gen. Ther. 5(6):350-356; Yu, M. et al. (1993) Proc. Natl. Acad.Sci. USA 90(13):6340-6344; Buller, R. M. et al. (1985) Nature317(6040):813-815; McGregor, D. P. et al. (1994) Mol. Immunol.31(3):219-226; and Verma, I. M. and N. Somia (1997) Nature 389:239-242.)The invention is not limited by the host cell employed.

[0172] In bacterial systems, a number of cloning and expression vectorsmay be selected depending upon the use intended for polynucleotidesequences encoding PRTS. For example, routine cloning, subcloning, andpropagation of polynucleotide sequences encoding PRTS can be achievedusing a multifunctional E. coli vector such as PBLUESCRIPT (Stratagene,La Jolla Calif.) or PSPORT1 plasmid (Life Technologies). Ligation ofsequences encoding PRTS into the vector's multiple cloning site disruptsthe lacZ gene, allowing a calorimetric screening procedure foridentification of transformed bacteria containing recombinant molecules.In addition, these vectors may be useful for in vitro transcription,dideoxy sequencing, single strand rescue with helper phage, and creationof nested deletions in the cloned sequence. (See, e.g., Van Heeke, G.and S. M. Schuster (1989) J. Biol. Chem. 264:5503-5509.) When largequantities of PRTS are needed, e.g. for the production of antibodies,vectors which direct high level expression of PRTS may be used. Forexample, vectors containing the strong, inducible SP6 or T7bacteriophage promoter may be used.

[0173] Yeast expression systems may be used for production of PRTS. Anumber of vectors containing constitutive or inducible promoters, suchas alpha factor, alcohol oxidase, and PGH promoters, may be used in theyeast Saccharomyces cerevisiae or Pichia pastoris. In addition, suchvectors direct either the secretion or intracellular retention ofexpressed proteins and enable integration of foreign sequences into thehost genome for stable propagation. (See, e.g., Ausubel, 1995, supra;Bitter, G. A. et al. (1987) Methods Enzymol. 153:516-544; and Scorer, C.A. et al. (1994) Bio/Technology 12:181-184.)

[0174] Plant systems may also be used for expression of PRTS.Transcription of sequences encoding PRTS may be driven by viralpromoters, e.g., the 35S and 19S promoters of CaMV used alone or incombination with the omega leader sequence from TMV (Takamatsu, N.(1987) EMBO J. 6:307-311). Alternatively, plant promoters such as thesmall subunit of RUBISCO or heat shock promoters may be used. (See,e.g., Coruzzi, G. et al. (1984) EMBO J. 3:1671-1680; Broglie, R. et al.(1984) Science 224:838-843; and Winter, J. et al. (1991) Results Probl.Cell Differ. 17:85-105.) These constructs can be introduced into plantcells by direct DNA transformation or pathogen-mediated transfection.(See, e.g., The McGraw Hill Yearbook of Science and Technology (1992)McGraw Hill, New York N.Y., pp. 191-196.)

[0175] In mammalian cells, a number of viral-based expression systemsmay be utilized. In cases where an adenovirus is used as an expressionvector, sequences encoding PRTS may be ligated into an adenovirustranscription/translation complex consisting of the late promoter andtripartite leader sequence. Insertion in a non-essential E1 or E3 regionof the viral genome may be used to obtain infective virus whichexpresses PRTS in host cells. (See, e.g., Logan, J. and T. Shenk (1984)Proc. Natl. Acad. Sci. USA 81:3655-3659.) In addition, transcriptionenhancers, such as the Rous sarcoma virus (RSV) enhancer, may be used toincrease expression in mammalian host cells. SV40 or EBV-based vectorsmay also be used for high-level protein expression.

[0176] Human artificial chromosomes (HACs) may also be employed todeliver larger fragments of DNA than can be contained in and expressedfrom a plasmid. HACs of about 6 kb to 10 Mb are constructed anddelivered via conventional delivery methods (liposomes, polycationicamino polymers, or vesicles) for therapeutic purposes. (See, e.g.,Harrington, J. J. et al. (1997) Nat. Genet. 15:345-355.)

[0177] For long term production of recombinant proteins in mammaliansystems, stable expression of PRTS in cell lines is preferred. Forexample, sequences encoding PRTS can be transformed into cell linesusing expression vectors which may contain viral origins of replicationand/or endogenous expression elements and a selectable marker gene onthe same or on a separate vector. Following the introduction of thevector, cells may be allowed to grow for about 1 to 2 days in enrichedmedia before being switched to selective media. The purpose of theselectable marker is to confer resistance to a selective agent, and itspresence allows growth and recovery of cells which successfully expressthe introduced sequences. Resistant clones of stably transformed cellsmay be propagated using tissue culture techniques appropriate to thecell type.

[0178] Any number of selection systems may be used to recovertransformed cell lines. These include, but are not limited to, theherpes simplex virus thymidine kinase and adeninephosphoribosyltransferase genes, for use in tk⁻ and apr⁻ cells,respectively. (See, e.g., Wigler, M. et al. (1977) Cell 11:223-232;Lowy, I. et al. (1980) Cell 22:817-823.) Also, antimetabolite,antibiotic, or herbicide resistance can be used as the basis forselection. For example, dhfr confers resistance to methotrexate; neoconfers resistance to the aminoglycosides neomycin and G-418; and alsand pat confer resistance to chlorsulfuron and phosphinotricinacetyltransferase, respectively. (See, e.g., Wigler, M. et al. (1980)Proc. Natl. Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al.(1981) J. Mol. Biol. 150:1-14.) Additional selectable genes have beendescribed, e.g., trpB and hisD, which alter cellular requirements formetabolites. (See, e.g., Hartman, S. C. and R. C. Mulligan (1988) Proc.Natl. Acad. Sci. USA 85:8047-8051.) Visible markers, e.g., anthocyanins,green fluorescent proteins (GFP; Clontech), B glucuronidase and itssubstrate 13-glucuronide, or luciferase and its substrate luciferin maybe used. These markers can be used not only to identify transformants,but also to quantify the amount of transient or stable proteinexpression attributable to a specific vector system. (See, e.g., Rhodes,C. A. (1995) Methods Mol. Biol. 55:121-131.)

[0179] Although the presence/absence of marker gene expression suggeststhat the gene of interest is also present, the presence and expressionof the gene may need to be confirmed. For example, if the sequenceencoding PRTS is inserted within a marker gene sequence, transformedcells containing sequences encoding PRTS can be identified by theabsence of marker gene function. Alternatively, a marker gene can beplaced in tandem with a sequence encoding PRTS under the control of asingle promoter. Expression of the marker gene in response to inductionor selection usually indicates expression of the tandem gene as well.

[0180] In general, host cells that contain the nucleic acid sequenceencoding PRTS and that express PRTS may be identified by a variety ofprocedures known to those of skill in the art. These procedures include,but are not limited to, DNA-DNA or DNA-RNA hybridizations, PCRamplification, and protein bioassay or immunoassay techniques whichinclude membrane, solution, or chip based technologies for the detectionand/or quantification of nucleic acid or protein sequences.

[0181] Immunological methods for detecting and measuring the expressionof PRTS using either specific polyclonal or monoclonal antibodies areknown in the art. Examples of such techniques include enzyme-linkedimmunosorbent assays (ELISAs), radioimmunoassays (RIAs), andfluorescence activated cell sorting (FACS). A two-site, monoclonal-basedimmunoassay utilizing monoclonal antibodies reactive to twonon-interfering epitopes on PRTS is preferred, but a competitive bindingassay may be employed. These and other assays are well known in the art.(See, e.g., Hampton, R. et al. (1990) Serological Methods, a LaboratoryManual, APS Press, St. Paul Minn., Sect. IV; Coligan, J. E. et al.(1997) Current Protocols in Immunology, Greene Pub. Associates andWiley-Interscience, New York N.Y.; and Pound, J. D. (1998)Immunochemical Protocols, Humana Press, Totowa N.J.)

[0182] A wide variety of labels and conjugation techniques are known bythose skilled in the art and may be used in various nucleic acid andamino acid assays. Means for producing labeled hybridization or PCRprobes for detecting sequences related to polynucleotides encoding PRTSinclude oligolabeling, nick translation, end-labeling, or PCRamplification using a labeled nucleotide. Alternatively, the sequencesencoding PRTS, or any fragments thereof, may be cloned into a vector forthe production of an mRNA probe. Such vectors are known in the art, arecommercially available, and may be used to synthesize RNA probes invitro by addition of an appropriate RNA polymerase such as T7, T3, orSP6 and labeled nucleotides. These procedures may be conducted using avariety of commercially available kits, such as those provided byAmersham Pharmacia Biotech, Promega (Madison Wis.), and US Biochemical.Suitable reporter molecules or labels which may be used for ease ofdetection include radionuclides, enzymes, fluorescent, chemiluminescent,or chromogenic agents, as well as substrates, cofactors, inhibitors,magnetic particles, and the like.

[0183] Host cells transformed with nucleotide sequences encoding PRTSmay be cultured under conditions suitable for the expression andrecovery of the protein from cell culture. The protein produced by atransformed cell may be secreted or retained intracellularly dependingon the sequence and/or the vector used. As will be understood by thoseof skill in the art, expression vectors containing polynucleotides whichencode PRTS may be designed to contain signal sequences which directsecretion of PRTS through a prokaryotic or eukaryotic cell membrane.

[0184] In addition, a host cell strain may be chosen for its ability tomodulate expression of the inserted sequences or to process theexpressed protein in the desired fashion. Such modifications of thepolypeptide include, but are not limited to, acetylation, carboxylation,glycosylation, phosphorylation, lipidation, and acylation.Post-translational processing which cleaves a “prepro” or “pro” form ofthe protein may also be used to specify protein targeting, folding,and/or activity. Different host cells which have specific cellularmachinery and characteristic mechanisms for post-translationalactivities (e.g., CHO, HeLa, MDCK, HEK293, and W138) are available fromthe American Type Culture Collection (ATCC, Manassas Va.) and may bechosen to ensure the correct modification and processing of the foreignprotein.

[0185] In another embodiment of the invention, natural, modified, orrecombinant nucleic acid sequences encoding PRTS may be ligated to aheterologous sequence resulting in translation of a fusion protein inany of the aforementioned host systems. For example, a chimeric PRTSprotein containing a heterologous moiety that can be recognized by acommercially available antibody may facilitate the screening of peptidelibraries for inhibitors of PRTS activity. Heterologous protein andpeptide moieties may also facilitate purification of fusion proteinsusing commercially available affinity matrices. Such moieties include,but are not limited to, glutathione S-transferase (GST), maltose bindingprotein (MBP), thioredoxin (Trx), calmodulin binding peptide (CBP),6-His, FLAG, c-myc, and hemagglutinin (HA). GST, MBP, Trx, CBP, and6-His enable purification of their cognate fusion proteins onimmobilized glutathione, maltose, phenylarsine oxide, calmodulin, andmetal-chelate resins, respectively. FLAG, c-myc, and hemagglutinin (HA)enable immunoaffinity purification of fusion proteins using commerciallyavailable monoclonal and polyclonal antibodies that specificallyrecognize these epitope tags. A fusion protein may also be engineered tocontain a proteolytic cleavage site located between the PRTS encodingsequence and the heterologous protein sequence, so that PRTS may becleaved away from the heterologous moiety following purification.Methods for fusion protein expression and purification are discussed inAusubel (1995, supra, ch. 10). A variety of commercially available kitsmay also be used to facilitate expression and purification of fusionproteins.

[0186] In a further embodiment of the invention, synthesis ofradiolabeled PRTS may be achieved in vitro using the TNT rabbitreticulocyte lysate or wheat germ extract system (Promega). Thesesystems couple transcription and translation of protein-coding sequencesoperably associated with the T7, T3, or SP6 promoters. Translation takesplace in the presence of a radiolabeled amino acid precursor, forexample, ³⁵S-methionine.

[0187] PRTS of the present invention or fragments thereof may be used toscreen for compounds that specifically bind to PRTS. At least one and upto a plurality of test compounds may be screened for specific binding toPRTS. Examples of test compounds include antibodies, oligonucleotides,proteins (e.g., receptors), or small molecules.

[0188] In one embodiment, the compound thus identified is closelyrelated to the natural ligand of PRTS, e.g., a ligand or fragmentthereof, a natural substrate, a structural or functional mimetic, or anatural binding partner. (See, e.g., Coligan, J. E. et al. (1991)Current Protocols in Immunology 1(2): Chapter 5.) Similarly, thecompound can be closely related to the natural receptor to which PRTSbinds, or to at least a fragment of the receptor, e.g., the ligandbinding site. In either case, the compound can be rationally designedusing known techniques. In one embodiment, screening for these compoundsinvolves producing appropriate cells which express PRTS, either as asecreted protein or on the cell membrane. Preferred cells include cellsfrom mammals, yeast, Drosophila, or E. coli. Cells expressing PRTS orcell membrane fractions which contain PRTS are then contacted with atest compound and binding, stimulation, or inhibition of activity ofeither PRTS or the compound is analyzed.

[0189] An assay may simply test binding of a test compound to thepolypeptide, wherein binding is detected by a fluorophore, radioisotope,enzyme conjugate, or other detectable label. For example, the assay maycomprise the steps of combining at least one test compound with PRTS,either in solution or affixed to a solid support, and detecting thebinding of PRTS to the compound. Alternatively, the assay may detect ormeasure binding of a test compound in the presence of a labeledcompetitor. Additionally, the assay may be carried out using cell-freepreparations, chemical libraries, or natural product mixtures, and thetest compound(s) may be free in solution or affixed to a solid support.

[0190] PRTS of the present invention or fragments thereof may be used toscreen for compounds that modulate the activity of PRTS. Such compoundsmay include agonists, antagonists, or partial or inverse agonists. Inone embodiment, an assay is performed under conditions permissive forPRTS activity, wherein PRTS is combined with at least one test compound,and the activity of PRTS in the presence of a test compound is comparedwith the activity of PRTS in the absence of the test compound. A changein the activity of PRTS in the presence of the test compound isindicative of a compound that modulates the activity of PRTS.Alternatively, a test compound is combined with an in vitro or cell-freesystem comprising PRTS under conditions suitable for PRTS activity, andthe assay is performed. In either of these assays, a test compound whichmodulates the activity of PRTS may do so indirectly and need not come indirect contact with the test compound. At least one and up to aplurality of test compounds may be screened.

[0191] In another embodiment, polynucleotides encoding PRTS or theirmammalian homologs may be “knocked out” in an animal model system usinghomologous recombination in embryonic stem (ES) cells. Such techniquesare well known in the art and are useful for the generation of animalmodels of human disease. (See, e.g., U.S. Pat. No. 5,175,383 and U.S.Pat. No. 5,767,337.) For example, mouse ES cells, such as the mouse129/SvJ cell line, are derived from the early mouse embryo and grown inculture. The ES cells are transformed with a vector containing the geneof interest disrupted by a marker gene, e.g., the neomycinphosphotransferase gene (neo; Capecchi, M. R. (1989) Science244:1288-1292). The vector integrates into the corresponding region ofthe host genome by homologous recombination. Alternatively, homologousrecombination takes place using the Cre-loxP system to knockout a geneof interest in a tissue- or developmental stage-specific manner (Marth,J. D. (1996) Clin. Invest. 97:1999-2002; Wagner, K. U. et al. (1997)Nucleic Acids Res. 25:4323-4330). Transformed ES cells are identifiedand microinjected into mouse cell blastocysts such as those from theC57BL/6 mouse strain. The blastocysts are surgically transferred topseudopregnant dams, and the resulting chimeric progeny are genotypedand bred to produce heterozygous or homozygous strains. Transgenicanimals thus generated may be tested with potential therapeutic or toxicagents.

[0192] Polynucleotides encoding PRTS may also be manipulated in vitro inES cells derived from human blastocysts. Human ES cells have thepotential to differentiate into at least eight separate cell lineagesincluding endoderm, mesoderm, and ectodermal cell types. These celllineages differentiate into, for example, neural cells, hematopoieticlineages, and cardiomyocytes (Thomson, J. A. et al. (1998) Science282:1145-1147).

[0193] Polynucleotides encoding PRTS can also be used to create“knockin” humanized animals (pigs) or transgenic animals (mice or rats)to model human disease. With knockin technology, a region of apolynucleotide encoding PRTS is injected into animal ES cells, and theinjected sequence integrates into the animal cell genome. Transformedcells are injected into blastulae, and the blastulae are implanted asdescribed above. Transgenic progeny or inbred lines are studied andtreated with potential pharmaceutical agents to obtain information ontreatment of a human disease. Alternatively, a mammal inbred tooverexpress PRTS, e.g., by secreting PRTS in its milk, may also serve asa convenient source of that protein (Janne, J. et al. (1998) Biotechnol.Annu. Rev. 4:55-74).

[0194] Therapeutics

[0195] Chemical and structural similarity, e.g., in the context ofsequences and motifs, exists between regions of PRTS and proteases. Inaddition, the expression of PRTS is closely associated withneurological, cardiovascular, hemic, prostate, endocrine, reproductive,immune system, bone and tumorus tissues and Alzheimer's disease.Therefore, PRTS appears to play a role in gastrointestinal,cardiovascular, autoimmune/inflammatory, cell proliferative,developmental, epitbelial, neurological, and reproductive disorders. Inthe treatment of disorders associated with increased PRTS expression oractivity, it is desirable to decrease the expression or activity ofPRTS. In the treatment of disorders associated with decreased PRTSexpression or activity, it is desirable to increase the expression oractivity of PRTS.

[0196] Therefore, in one embodiment, PRTS or a fragment or derivativethereof may be administered to a subject to treat or prevent a disorderassociated with decreased expression or activity of PRTS. Examples ofsuch disorders include, but are not limited to, a gastrointestinaldisorder, such as dysphagia, peptic esophagitis, esophageal spasm,esophageal stricture, esophageal carcinoma, dyspepsia, indigestion,gastritis, gastric carcinoma, anorexia, nausea, emesis, gastroparesis,antral or pyloric edema, abdominal angina, pyrosis, gastroenteritis,intestinal obstruction, infections of the intestinal tract, pepticulcer, cholelithiasis, cholecystitis, cholestasis, pancreatitis,pancreatic carcinoma, biliary tract disease, hepatitis,hyperbilirubinemia, cirrhosis, passive congestion of the liver,hepatoma, infectious colitis, ulcerative colitis, ulcerative proctitis,Crohn's disease, Whipple's disease, Mallory-Weiss syndrome, coloniccarcinoma, colonic obstruction, irritable bowel syndrome, short bowelsyndrome, diarrhea, constipation, gastrointestinal hemorrhage, acquiredimmunodeficiency syndrome (AIDS) enteropathy, jaundice, hepaticencephalopathy, hepatorenal syndrome, hepatic steatosis,hemochromatosis, Wilson's disease, alpha,-antitrypsin deficiency, Reye'ssyndrome, primary sclerosing cholangitis, liver infarction, portal veinobstruction and thrombosis, centrilobular necrosis, peliosis hepatis,hepatic vein thrombosis, veno-occlusive disease, preeclampsia,eclampsia, acute fatty liver of pregnancy, intrahepatic cholestasis ofpregnancy, and hepatic tumors including nodular hyperplasias, adenomas,and carcinomas; a cardiovascular disorder, such as arterioyenousfistula, atherosclerosis, hypertension, vasculitis, Raynaud's disease,aneurysms, arterial dissections, varicose veins, thrombophlebitis andphlebothrombosis, vascular tumors, and complications of thrombolysis,balloon angioplasty, vascular replacement, and coronary artery bypassgraft surgery, congestive heart failure, ischemic heart disease, anginapectoris, myocardial infarction, hypertensive heart disease,degenerative valvular heart disease, calcific aortic valve stenosis,congenitally bicuspid aortic valve, mitral annular calcification, mitralvalve prolapse, rheumatic fever and rheumatic heart disease, infectiveendocarditis, nonbacterial thrombotic endocarditis, endocarditis ofsystemic lupus erythematosus, carcinoid heart disease, cardiomyopathy,myocarditis, pericarditis, neoplastic heart disease, congenital heartdisease, and complications of cardiac transplantation; anautoimmune/inflammatory disorder, such as acquired immunodeficiencysyndrome (AIDS), Addison's disease, adult respiratory distress syndrome,allergies, ankylosing spondylitis, amyloidosis, anemia, asthma,atherosclerosis, atherosclerotic plaque rupture, autoimmune hemolyticanemia, autoimmune thyroiditis, autoimmunepolyendocrinopathy-candidiasis-ectodermal dystrophy (APECED),bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopicdermatitis, dermatomyositis, diabetes mellitus, emphysema, episodiclymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythemanodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome,gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia,irritable bowel syndrome, multiple sclerosis, myasthenia gravis,myocardial or pericardial inflammation, osteoartbritis, degradation ofarticular cartilage, osteoporosis, pancreatitis, polymyositis,psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma,Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus,systemic sclerosis, thrombocytopenic purpura, ulcerative colitis,uveitis, Werner syndrome, complications of cancer, hemodialysis, andextracorporeal circulation, viral, bacterial, fungal, parasitic,protozoal, and helminthic infections, and trauma; a cell proliferativedisorder such as actinic keratosis, arteriosclerosis, atherosclerosis,bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD),myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera,psoriasis, primary thrombocythemia, and cancers includingadenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma,teratocarcinoma, and, in particular, cancers of the adrenal gland,bladder, bone, bone marrow, brain, breast, cervix, gall bladder,ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle,ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin,spleen, testis, thymus, thyroid, and uterus; a developmental disorder,such as renal tubular acidosis, anemia, Cushing's syndrome,achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, boneresorption, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tumor,aniridia, genitourinary abnormalities, and mental retardation),Smith-Magenis syndrome, myelodysplastic syndrome, hereditarymucoepithelial dysplasia, hereditary keratodermas, hereditaryneuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis,hypothyroidism, hydrocephalus, seizure disorders such as Syndenham'schorea and cerebral palsy, spina bifida, anencephaly,craniorachischisis, congenital glaucoma, cataract, age-related maculardegeneration, and sensorineural hearing loss; an epithelial disorder,such as dyshidrotic eczema, allergic contact dermatitis, keratosispilaris, melasma, vitiligo, actinic keratosis, basal cell carcinoma,squamous cell carcinoma, seborrheic keratosis, folliculitis, herpessimplex, herpes zoster, varicella, candidiasis, dermatophytosis,scabies, insect bites, cherry angioma, keloid, dermatofibroma,acrochordons, urticaria, transient acantholytic dermatosis, xerosis,eczema, atopic dermatitis, contact dermatitis, hand eczema, nunmuulareczema, lichen simplex chronicus, asteatotic eczema, stasis dermatitisand stasis ulceration, seborrheic dermatitis, psoriasis, lichen planus,pityriasis rosea, impetigo, ecthyma, dermatophytosis, tinea versicolor,warts, acne vulgaris, acne rosacea, pemphigus vulgaris, pemphigusfoliaceus, paraneoplastic pemphigus, bullous pemphigoid, herpesgestationis, dermatitis herpetiformis, linear IgA disease, epidermolysisbullosa acquisita, dermatomyositis, lupus erythematosus, scleroderma andmorphea, erythroderma, alopecia, figurate skin lesions, telangiectasias,hypopigmentation, hyperpigmentation, vesicles/bullae, exanthems,cutaneous drug reactions, papulonodular skin lesions, chronicnon-healing wounds, photosensitivity diseases, epidermolysis bullosasimplex, epidermolytic hyperkeratosis, epidermolytic andnonepidermolytic palmoplantar keratoderma, ichthyosis bullosa ofSiemens, ichthyosis exfoliativa, keratosis palmaris et plantaris,keratosis palmoplantaris, palmoplantar keratoderma, keratosis punctata,Meesmann's corneal dystrophy, pachyonychia congenita, white spongenevus, steatocystoma multiplex, epidermal nevi/epidermolytichyperkeratosis type, monilethrix, trichothiodystrophy, chronichepatitis/cryptogenic cirrhosis, and colorectal hyperplasia; aneurological disorder, such as epilepsy, ischemic cerebrovasculardisease, stroke, cerebral neoplasms, Alzheimer's disease, Pick'sdisease, Huntington's disease, dementia, Parkinson's disease and otherextrapyramidal disorders, amyotrophic lateral sclerosis and other motorneuron disorders, progressive neural muscular atrophy, retinitispigmentosa, hereditary ataxias, multiple sclerosis and otherdemyelinating diseases, bacterial and viral meningitis, brain abscess,subdural empyema, epidural abscess, suppurative intracranialthrombophlebitis, myelitis and radiculitis, viral central nervous systemdisease, prion diseases including kuru, Creutzfeldt-Jakob disease, andGerstmann-Straussler-Scheinker syndrome, fatal familial insomnia,nutritional and metabolic diseases of the nervous system,neurofibromatosis, tuberous sclerosis, cerebelloretinalbemangioblastomatosis, encephalotrigeminal syndrome, mental retardationand other developmental disorders of the central nervous systemincluding Down syndrome, cerebral palsy, neuroskeletal disorders,autonomic nervous system disorders, cranial nerve disorders, spinal corddiseases, muscular dystrophy and other neuromuscular disorders,peripheral nervous system disorders, dermatomyositis and polymyositis,inherited, metabolic, endocrine, and toxic myopathies, myastheniagravis, periodic paralysis, mental disorders including mood, anxiety,and schizophrenic disorders, seasonal affective disorder (SAD),akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia,dystonias, paranoid psychoses, postherpetic neuralgia, Tourette'sdisorder, progressive supranuclear palsy, corticobasal degeneration, andfamilial frontotemporal dementia; and a reproductive disorder, such asinfertility, including tubal disease, ovulatory defects, andendometriosis, a disorder of prolactin production, a disruption of theestrous cycle, a disruption of the menstrual cycle, polycystic ovarysyndrome, ovarian hyperstimulation syndrome, an endometrial or ovariantumor, a uterine fibroid, autoimmune disorders, an ectopic pregnancy,and teratogenesis; cancer of the breast, fibrocystic breast disease, andgalactorrhea; a disruption of spermatogenesis, abnormal spermphysiology, cancer of the testis, cancer of the prostate, benignprostatic hyperplasia, prostatitis, Peyronie's disease, impotence,carcinoma of the male breast, and gynecomastia.

[0197] In another embodiment, a vector capable of expressing PRTS or afragment or derivative thereof may be administered to a subject to treator prevent a disorder associated with decreased expression or activityof PRTS including, but not limited to, those described above.

[0198] In a further embodiment, a composition comprising a substantiallypurified PRTS in conjunction with a suitable pharmaceutical carrier maybe administered to a subject to treat or prevent a disorder associatedwith decreased expression or activity of PRTS including, but not limitedto, those provided above.

[0199] In still another embodiment, an agonist which modulates theactivity of PRTS may be administered to a subject to treat or prevent adisorder associated with decreased expression or activity of PRTSincluding, but not limited to, those listed above.

[0200] In a further embodiment, an antagonist of PRTS may beadministered to a subject to treat or prevent a disorder associated withincreased expression or activity of PRTS. Examples of such disordersinclude, but are not limited to, those gastrointestinal, cardiovascular,autoimmune/inflammatory, cell proliferative, developmental, epithelial,neurological, and reproductive disorders described above. In one aspect,an antibody which specifically binds PRTS may be used directly as anantagonist or indirectly as a targeting or delivery mechanism forbringing a pharmaceutical agent to cells or tissues which express PRTS.

[0201] In an additional embodiment, a vector expressing the complementof the polynucleotide encoding PRTS may be administered to a subject totreat or prevent a disorder associated with increased expression oractivity of PRTS including, but not limited to, those described above.

[0202] In other embodiments, any of the proteins, antagonists,antibodies, agonists, complementary sequences, or vectors of theinvention may be administered in combination with other appropriatetherapeutic agents. Selection of the appropriate agents for use incombination therapy may be made by one of ordinary skill in the art,according to conventional pharmaceutical principles. The combination oftherapeutic agents may act synergistically to effect the treatment orprevention of the various disorders described above. Using thisapproach, one may be able to achieve therapeutic efficacy with lowerdosages of each agent, thus reducing the potential for adverse sideeffects.

[0203] An antagonist of PRTS may be produced using methods which aregenerally known in the art. In particular, purified PRTS may be used toproduce antibodies or to screen libraries of pharmaceutical agents toidentify those which specifically bind PRTS. Antibodies to PRTS may alsobe generated using methods that are well known in the art. Suchantibodies may include, but are not limited to, polyclonal, monoclonal,chimeric, and single chain antibodies, Fab fragments, and fragmentsproduced by a Fab expression library. Neutralizing antibodies (i.e.,those which inhibit dimer formation) are generally preferred fortherapeutic use.

[0204] For the production of antibodies, various hosts including goats,rabbits, rats, mice, humans, and others may be immunized by injectionwith PRTS or with any fragment or oligopeptide thereof which hasimmunogenic properties. Depending on the host species, various adjuvantsmay be used to increase immunological response. Such adjuvants include,but are not limited to, Freund's, mineral gels such as aluminumhydroxide, and surface active substances such as lysolecithin, pluronicpolyols, polyanions, peptides, oil emulsions, KLH, and dinitrophenol.Among adjuvants used in humans, BCG (bacilli Calmette-Guerin) andCorynebacterium parvum are especially preferable.

[0205] It is preferred that the oligopeptides, peptides, or fragmentsused to induce antibodies to PRTS have an amino acid sequence consistingof at least about 5 amino acids, and generally will consist of at leastabout 10 amino acids. It is also preferable that these oligopeptides,peptides, or fragments are identical to a portion of the amino acidsequence of the natural protein. Short stretches of PRTS amino acids maybe fused with those of another protein, such as KLH, and antibodies tothe chimeric molecule may be produced.

[0206] Monoclonal antibodies to PRTS may be prepared using any techniquewhich provides for the production of antibody molecules by continuouscell lines in culture. These include, but are not limited to, thehybridoma technique, the human B-cell hybridoma technique, and theEBV-hybridoma technique. (See, e.g., Kohler, G. et al. (1975) Nature256:495-497; Kozbor, D. et al. (1985) J. Immunol. Methods 81:31-42;Cote, R. J. et al. (1983) Proc. Natl. Acad. Sci. USA 80:2026-2030; andCole, S. P. et al. (1984) Mol. Cell Biol. 62:109-120.)

[0207] In addition, techniques developed for the production of “chimericantibodies,” such as the splicing of mouse antibody genes to humanantibody genes to obtain a molecule with appropriate antigen specificityand biological activity, can be used. (See, e.g., Morrison, S. L. et al.(1984) Proc. Natl. Acad. Sci. USA 81:6851-6855; Neuberger, M. S. et al.(1984) Nature 312:604-608; and Takeda, S. et al. (1985) Nature314:452-454.) Alternatively, techniques described for the production ofsingle chain antibodies may be adapted, using methods known in the art,to produce PRTS-specific single chain antibodies. Antibodies withrelated specificity, but of distinct idiotypic composition, may begenerated by chain shuffling from random combinatorial immunoglobulinlibraries. (See, e.g., Burton, D. R. (1991) Proc. Natl. Acad. Sci. USA88:10134-10137.)

[0208] Antibodies may also be produced by inducing in vivo production inthe lymphocyte population or by screening immunoglobulin libraries orpanels of highly specific binding reagents as disclosed in theliterature. (See, e.g., Orlandi, R. et al. (1989) Proc. Natl. Acad. Sci.USA 86:3833-3837; Winter, G. et al. (1991) Nature 349:293-299.)

[0209] Antibody fragments which contain specific binding sites for PRTSmay also be generated. For example, such fragments include, but are notlimited to, F(ab′)₂ fragments produced by pepsin digestion of theantibody molecule and Fab fragments generated by reducing the disulfidebridges of the F(ab′)₂ fragments. Alternatively, Fab expressionlibraries may be constructed to allow rapid and easy identification ofmonoclonal Fab fragments with the desired specificity. (See, e.g., Huse,W. D. et al. (1989) Science 246:1275-1281.)

[0210] Various immunoassays may be used for screening to identifyantibodies having the desired specificity. Numerous protocols forcompetitive binding or immunoradiometric assays using either polyclonalor monoclonal antibodies with established specificities are well knownin the art. Such immunoassays typically involve the measurement ofcomplex formation between PRTS and its specific antibody. A two-site,monoclonal-based immunoassay utilizing monoclonal antibodies reactive totwo non-interfering PRTS epitopes is generally used, but a competitivebinding assay may also be employed (Pound, supra).

[0211] Various methods such as Scatchard analysis in conjunction withradioimmunoassay techniques may be used to assess the affinity ofantibodies for PRTS. Affinity is expressed as an association constant,K_(a), which is defined as the molar concentration of PRTS-antibodycomplex divided by the molar concentrations of free antigen and freeantibody under equilibrium conditions. The K_(a) determined for apreparation of polyclonal antibodies, which are heterogeneous in theiraffinities for multiple PRTS epitopes, represents the average affinity,or avidity, of the antibodies for PRTS. The K_(a) determined for apreparation of monoclonal antibodies, which are monospecific for aparticular PRTS epitope, represents a true measure of affinity.High-affinity antibody preparations with Ka ranging from about 10⁹ to10¹² L/mole are preferred for use in immunoassays in which thePRTS-antibody complex must withstand rigorous manipulations.Low-affinity antibody preparations with Ka ranging from about 10⁶ to 10⁷L/mole are preferred for use in immunopurification and similarprocedures which ultimately require dissociation of PRTS, preferably inactive form, from the antibody (Catty, D. (1988) Antibodies, Volume I: APractical Approach, IRL Press, Washington D.C.; Liddell, J. E. and A.Cryer (1991) A Practical Guide to Monoclonal Antibodies, John Wiley &Sons, New York N.Y.).

[0212] The titer and avidity of polyclonal antibody preparations may befurther evaluated to determine the quality and suitability of suchpreparations for certain downstream applications. For example, apolyclonal antibody preparation containing at least 1-2 mg specificantibody/ml, preferably 5-10 mg specific antibody/ml, is generallyemployed in procedures requiring precipitation of PRTS-antibodycomplexes. Procedures for evaluating antibody specificity, titer, andavidity, and guidelines for antibody quality and usage in variousapplications, are generally available. (See, e.g., Catty, supra, andColigan et al. supra.)

[0213] In another embodiment of the invention, the polynucleotidesencoding PRTS, or any fragment or complement thereof, may be used fortherapeutic purposes. In one aspect, modifications of gene expressioncan be achieved by designing complementary sequences or antisensemolecules (DNA, RNA, PNA, or modified oligonucleotides) to the coding orregulatory regions of the gene encoding PRTS. Such technology is wellknown in the art, and antisense oligonucleotides or larger fragments canbe designed from various locations along the coding or control regionsof sequences encoding PRTS. (See, e.g., Agrawal, S., ed. (1996)Antisense Therapeutics, Humana Press Inc., Totawa N.J.)

[0214] In therapeutic use, any gene delivery system suitable forintroduction of the antisense sequences into appropriate target cellscan be used. Antisense sequences can be delivered intracellularly in theform of an expression plasmid which, upon transcription, produces asequence complementary to at least a portion of the cellular sequenceencoding the target protein. (See, e.g., Slater, J. E. et al. (1998) J.Allergy Clin. Immunol. 102(3):469-475; and Scanlon, K. J. et al. (1995)9(13):1288-1296.) Antisense sequences can also be introducedintracellularly through the use of viral vectors, such as retrovirus andadeno-associated virus vectors. (See, e.g., Miller, A. D. (1990) Blood76:271; Ausubel, supra; Uckert, W. and W. Walther (1994) Pharmacol.Ther. 63(3):323-347.) Other gene delivery mechanisms includeliposome-derived systems, artificial viral envelopes, and other systemsknown in the art. (See, e.g., Rossi, J. J. (1995) Br. Med. Bull.51(1):217-225; Boado, R. J. et al. (1998) J. Pharm. Sci.87(11):1308-1315; and Morris, M. C. et al. (1997) Nucleic Acids Res.25(14):2730-2736.)

[0215] In another embodiment of the invention, polynucleotides encodingPRTS may be used for somatic or germline gene therapy. Gene therapy maybe performed to (i) correct a genetic deficiency (e.g., in the cases ofsevere combined immunodeficiency (SCID)-X1 disease characterized byX-linked inheritance (Cavazzana-Calvo, M. et al. (2000) Science288:669-672), severe combined immunodeficiency syndrome associated withan inherited adenosine deaminase (ADA) deficiency (Blaese, R. M. et al.(1995) Science 270:475-480; Bordignon, C. et al. (1995) Science270:470475), cystic fibrosis (Zabner, J. et al. (1993) Cell 75:207-216;Crystal, R. G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal, R. G.et al. (1995) Hum. Gene Therapy 6:667-703), thalassamias, familialhypercholesterolemia, and hemophilia resulting from Factor VIII orFactor IX deficiencies (Crystal, R. G. (1995) Science 270:404-410;Verma, I. M. and N. Somia (1997) Nature 389:239-242)), (ii) express aconditionally lethal gene product (e.g., in the case of cancers whichresult from unregulated cell proliferation), or (iii) express a proteinwhich affords protection against intracellular parasites (e.g., againsthuman retroviruses, such as human immunodeficiency virus (HIV)(Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996)Proc. Natl. Acad. Sci. USA. 93:11395-11399), hepatitis B or C virus(HBV, HCV); fungal parasites, such as Candida albicans andParacoccidioides brasiliensis; and protozoan parasites such asPlasmodium falciparum and Trypanosoma cruzi). In the case where agenetic deficiency in PRTS expression or regulation causes disease, theexpression of PRTS from an appropriate population of transduced cellsmay alleviate the clinical manifestations caused by the geneticdeficiency.

[0216] In a further embodiment of the invention, diseases or disorderscaused by deficiencies in PRTS are treated by constructing mammalianexpression vectors encoding PRTS and introducing these vectors bymechanical means into PRTS-deficient cells. Mechanical transfertechnologies for use with cells in vivo or ex vitro include (i) directDNA microinjection into individual cells, (ii) ballistic gold particledelivery, (iii) liposome-mediated transfection, (iv) receptor-mediatedgene transfer, and (v) the use of DNA transposons (Morgan, R. A. and W.F. Anderson (1993) Annu. Rev. Biochem. 62:191-217; Ivics, Z. (1997) Cell91:501-510; Boulay, J-L. and H. Récipon (1998) Curr. Opin. Biotechnol.9:445-450).

[0217] Expression vectors that may be effective for the expression ofPRTS include, but are not limited to, the PcDNA 3.1, EPITAG, PRCCMV2,PREP, PVAX vectors (Invitrogen, Carlsbad Calif.), PCMV-SCRIPT, PCMV-TAG,PEGSH/PERV (Stratagene, La Jolla Calif.), and PTET-OFF, PTET-ON, PTRE2,PTRE2-LUC, PTK-HYG (Clontech, Palo Alto Calif.). PRTS may be expressedusing (i) a constitutively active promoter, (e.g., from cytomegalovirus(CMV), Rous sarcoma virus (RSV), SV40 virus, thymidine kinase (TK), orP-actin genes), (ii) an inducible promoter (e.g., thetetracycline-regulated promoter (Gossen, M. and H. Bujard (1992) Proc.Natl. Acad. Sci. USA 89:5547-5551; Gossen, M. et al. (1995) Science268:1766-1769; Rossi, F. M. V. and H. M. Blau (1998) Curr. Opin.Biotechnol. 9:451-456), commercially available in the T-REX plasmid(Invitrogen)); the ecdysone-inducible promoter (available in theplasmids PVGRXR and PIND; Invitrogen); the FK506/rapamycin induciblepromoter; or the RU486/mifepristone inducible promoter (Rossi, F. M. V.and Blau, H. M. supra)), or (iii) a tissue-specific promoter or thenative promoter of the endogenous gene encoding PRTS from a normalindividual.

[0218] Commercially available liposome transformation kits (e.g., thePERFECT LIPID TRANSFECTION KIT, available from Invitrogen) allow onewith ordinary skill in the art to deliver polynucleotides to targetcells in culture and require minimal effort to optimize experimentalparameters. In the alternative, transformation is performed using thecalcium phosphate method (Graham, F. L. and A. J. Eb (1973) Virology52:456-467), or by electroporation (Neumann, E. et al. (1982) EMBO J.1:841-845). The introduction of DNA to primary cells requiresmodification of these standardized mammalian transfection protocols.

[0219] In another embodiment of the invention, diseases or disorderscaused by genetic defects with respect to PRTS expression are treated byconstructing a retrovirus vector consisting of (i) the polynucleotideencoding PRTS under the control of an independent promoter or theretrovirus long terminal repeat (LTR) promoter, (ii) appropriate RNApackaging signals, and (iii) a Rev-responsive element (RRE) along withadditional retrovirus cis-acting RNA sequences and coding sequencesrequired for efficient vector propagation. Retrovirus vectors (e.g., PFBand PFBNEO) are commercially available (Stratagene) and are based onpublished data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. USA92:6733-6737), incorporated by reference herein. The vector ispropagated in an appropriate vector producing cell line (VPCL) thatexpresses an envelope gene with a tropism for receptors on the targetcells or a promiscuous envelope protein such as VSVg (Armentano, D. etal. (1987) J. Virol. 61:1647-1650; Bender, M. A. et al. (1987) J. Virol.61:1639-1646; Adam, M. A. and A. D. Miller (1988) J. Virol.62:3802-3806; Dull, T. et al. (1998) J. Virol. 72:8463-8471; Zufferey,R. et al. (1998) J. Virol. 72:9873-9880). U.S. Pat. No. 5,910,434 toRigg (“Method for obtaining retrovirus packaging cell lines producinghigh transducing efficiency retroviral supernatant”) discloses a methodfor obtaining retrovirus packaging cell lines and is hereby incorporatedby reference. Propagation of retrovirus vectors, transduction of apopulation of cells (e.g., CD4⁺ T-cells), and the return of transducedcells to a patient are procedures well known to persons skilled in theart of gene therapy and have been well documented (Ranga, U. et al.(1997) J. Virol. 71:7020-7029; Bauer, G. et al. (1997) Blood89:2259-2267; Bonyhadi, M. L. (1997) J. Virol. 71:4707-4716; Ranga, U.et al. (1998) Proc. Natl. Acad. Sci. USA 95:1201-1206; Su, L. (1997)Blood 89:2283-2290).

[0220] In the alternative, an adenovirus-based gene therapy deliverysystem is used to deliver polynucleotides encoding PRTS to cells whichhave one or more genetic abnormalities with respect to the expression ofPRTS. The construction and packaging of adenovirus-based vectors arewell known to those with ordinary skill in the art. Replicationdefective adenovirus vectors have proven to be versatile for importinggenes encoding immunoregulatory proteins into intact islets in thepancreas (Csete, M. E. et al. (1995) Transplantation 27:263-268).Potentially useful adenoviral vectors are described in U.S. Pat. No.5,707,618 to Armentano (“Adenovirus vectors for gene therapy”), herebyincorporated by reference. For adenoviral vectors, see also Antinozzi,P. A. et al. (1999) Annu. Rev. Nutr. 19:511-544 and Verma, I. M. and N.Somia (1997) Nature 18:389:239-242, both incorporated by referenceherein.

[0221] In another alternative, a herpes-based, gene therapy deliverysystem is used to deliver polynucleotides encoding PRTS to target cellswhich have one or more genetic abnormalities with respect to theexpression of PRTS. The use of herpes simplex virus (HSV)-based vectorsmay be especially valuable for introducing PRTS to cells of the centralnervous system, for which HSV has a tropism. The construction andpackaging of herpes-based vectors are well known to those with ordinaryskill in the art. A replication-competent herpes simplex virus (HSV)type 1-based vector has been used to deliver a reporter gene to the eyesof primates (Liu, X. et al. (1999) Exp. Eye Res. 169:385-395). Theconstruction of a HSV-1 virus vector has also been disclosed in detailin U.S. Pat. No. 5,804,413 to DeLuca (“Herpes simplex virus strains forgene transfer”), which is hereby incorporated by reference. U.S. Pat.No. 5,804,413 teaches the use of recombinant HSV d92 which consists of agenome containing at least one exogenous gene to be transferred to acell under the control of the appropriate promoter for purposesincluding human gene therapy. Also taught by this patent are theconstruction and use of recombinant HSV strains deleted for ICP4, ICP27and ICP22. For HSV vectors, see also Goins, W. F. et al. (1999) J.Virol. 73:519-532 and Xu, H. et al. (1994) Dev. Biol. 163:152-161,hereby incorporated by reference. The manipulation of cloned herpesvirussequences, the generation of recombinant virus following thetransfection of multiple plasmids containing different segments of thelarge herpesvirus genomes, the growth and propagation of herpesvirus,and the infection of cells with herpesvirus are techniques well known tothose of ordinary skill in the art.

[0222] In another alternative, an alphavirus (positive, single-strandedRNA virus) vector is used to deliver polynucleotides encoding PRTS totarget cells. The biology of the prototypic alphavirus, Semliki ForestVirus (SFV), has been studied extensively and gene transfer vectors havebeen based on the SFV genome (Garoff, H. and K.-J. Li (1998) Curr. Opin.Biotechnol. 9:464-469). During alphavirus RNA replication, a subgenomicRNA is generated that normally encodes the viral capsid proteins. Thissubgenomic RNA replicates to higher levels than the full length genomicRNA, resulting in the overproduction of capsid proteins relative to theviral proteins with enzymatic activity (e.g., protease and polymerase).Similarly, inserting the coding sequence for PRTS into the alphavirusgenome in place of the capsid-coding region results in the production ofa large number of PRTS-coding RNAs and the synthesis of high levels ofPRTS in vector transduced cells. While alphavirus infection is typicallyassociated with cell lysis within a few days, the ability to establish apersistent infection in hamster normal kidney cells (BHK-21) with avariant of Sindbis virus (SIN) indicates that the lytic replication ofalphaviruses can be altered to suit the needs of the gene therapyapplication (Dryga, S. A. et al. (1997) Virology 228:74-83). The widehost range of alphaviruses will allow the introduction of PRTS into avariety of cell types. The specific transduction of a subset of cells ina population may require the sorting of cells prior to transduction. Themethods of manipulating infectious cDNA clones of alphaviruses,performing alphavirus cDNA and RNA transfections, and performingalphavirus infections, are well known to those with ordinary skill inthe art.

[0223] Oligonucleotides derived from the transcription initiation site,e.g., between about positions −10 and +10 from the start site, may alsobe employed to inhibit gene expression. Similarly, inhibition can beachieved using triple helix base-pairing methodology. Triple helixpairing is useful because it causes inhibition of the ability of thedouble helix to open sufficiently for the binding of polymerases,transcription factors, or regulatory molecules. Recent therapeuticadvances using triplex DNA have been described in the literature. (See,e.g., Gee, J. E. et al. (1994) in Huber, B. E. and B. I. Carr, Molecularand Immunologic Approaches, Futura Publishing, Mt. Kisco N.Y., pp.163-177.) A complementary sequence or antisense molecule may also bedesigned to block translation of mRNA by preventing the transcript frombinding to ribosomes.

[0224] Ribozymes, enzymatic RNA molecules, may also be used to catalyzethe specific cleavage of RNA. The mechanism of ribozyme action involvessequence-specific hybridization of the ribozyme molecule tocomplementary target RNA, followed by endonucleolytic cleavage. Forexample, engineered hammerhead motif ribozyme molecules may specificallyand efficiently catalyze endonucleolytic cleavage of sequences encodingPRTS.

[0225] Specific ribozyme cleavage sites within any potential RNA targetare initially identified by scanning the target molecule for ribozymecleavage sites, including the following sequences: GUA, GUU, and GUC.Once identified, short RNA sequences of between 15 and 20ribonucleotides, corresponding to the region of the target genecontaining the cleavage site, may be evaluated for secondary structuralfeatures which may render the oligonucleotide inoperable. Thesuitability of candidate targets may also be evaluated by testingaccessibility to hybridization with complementary oligonucleotides usingribonuclease protection assays.

[0226] Complementary ribonucleic acid molecules and ribozymes of theinvention may be prepared by any method known in the art for thesynthesis of nucleic acid molecules. These include techniques forchemically synthesizing oligonucleotides such as solid phasephosphoramidite chemical synthesis. Alternatively, RNA molecules may begenerated by in vitro and in vivo transcription of DNA sequencesencoding PRTS. Such DNA sequences may be incorporated into a widevariety of vectors with suitable RNA polymerase promoters such as T7 orSP6. Alternatively, these cDNA constructs that synthesize complementaryRNA, constitutively or inducibly, can be introduced into cell lines,cells, or tissues.

[0227] RNA molecules may be modified to increase intracellular stabilityand half-life. Possible modifications include, but are not limited to,the addition of flanking sequences at the 5′ and/or 3′ ends of themolecule, or the use of phosphorothioate or 2′ O-methyl rather thanphosphodiesterase linkages within the backbone of the molecule. Thisconcept is inherent in the production of PNAs and can be extended in allof these molecules by the inclusion of nontraditional bases such asinosine, queosine, and wybutosine, as well as acetyl-, methyl-, thio-,and similarly modified forms of adenine, cytidine, guanine, thymine, anduridine which are not as easily recognized by endogenous endonucleases.

[0228] An additional embodiment of the invention encompasses a methodfor screening for a compound which is effective in altering expressionof a polynucleotide encoding PRTS. Compounds which may be effective inaltering expression of a specific polynucleotide may include, but arenot limited to, oligonucleotides, antisense oligonucleotides, triplehelix-forming oligonucleotides, transcription factors and otherpolypeptide transcriptional regulators, and non-macromolecular chemicalentities which are capable of interacting with specific polynucleotidesequences. Effective compounds may alter polynucleotide expression byacting as either inhibitors or promoters of polynucleotide expression.Thus, in the treatment of disorders associated with increased PRTSexpression or activity, a compound which specifically inhibitsexpression of the polynucleotide encoding PRTS may be therapeuticallyuseful, and in the treatment of disorders associated with decreased PRTSexpression or activity, a compound which specifically promotesexpression of the polynucleotide encoding PRTS may be therapeuticallyuseful.

[0229] At least one, and up to a plurality, of test compounds may bescreened for effectiveness in altering expression of a specificpolynucleotide. A test compound may be obtained by any method commonlyknown in the art, including chemical modification of a compound known tobe effective in altering polynucleotide expression; selection from anexisting, commercially-available or proprietary library ofnaturally-occurring or non-natural chemical compounds; rational designof a compound based on chemical and/or structural properties of thetarget polynucleotide; and selection from a library of chemicalcompounds created combinatorially or randomly. A sample comprising apolynucleotide encoding PRTS is exposed to at least one test compoundthus obtained. The sample may comprise, for example, an intact orpermeabilized cell, or an in vitro cell-free or reconstitutedbiochemical system. Alterations in the expression of a polynucleotideencoding PRTS are assayed by any method commonly known in the art.Typically, the expression of a specific nucleotide is detected byhybridization with a probe having a nucleotide sequence complementary tothe sequence of the polynucleotide encoding PRTS. The amount ofhybridization may be quantified, thus forming the basis for a comparisonof the expression of the polynucleotide both with and without exposureto one or more test compounds. Detection of a change in the expressionof a polynucleotide exposed to a test compound indicates that the testcompound is effective in altering the expression of the polynucleotide.A screen for a compound effective in altering expression of a specificpolynucleotide can be carried out, for example, using aSchizosaccharomyces pombe gene expression system (Atkins, D. et al.(1999) U.S. Pat. No. 5,932,435; Arndt, G. M. et al. (2000) Nucleic AcidsRes. 28:E15) or a human cell line such as HeLa cell (Clarke, M. L. etal. (2000) Biochem. Biophys. Res. Commun. 268:8-13). A particularembodiment of the present invention involves screening a combinatoriallibrary of oligonucleotides (such as deoxyribonucleotides,ribonucleotides, peptide nucleic acids, and modified oligonucleotides)for antisense activity against a specific polynucleotide sequence(Bruice, T. W. et al. (1997) U.S. Pat. No. 5,686,242; Bruice, T. W. etal. (2000) U.S. Pat. No. 6,022,691).

[0230] Many methods for introducing vectors into cells or tissues areavailable and equally suitable for use in vivo, in vitro, and ex vivo.For ex vivo therapy, vectors may be introduced into stem cells takenfrom the patient and clonally propagated for autologous transplant backinto that same patient. Delivery by transfection, by liposomeinjections, or by polycationic amino polymers may be achieved usingmethods which are well known in the art. (See, e.g., Goldman, C. K. etal. (1997) Nat. Biotechnol. 15:462-466.)

[0231] Any of the therapeutic methods described above may be applied toany subject in need of such therapy, including, for example, mammalssuch as humans, dogs, cats, cows, horses, rabbits, and monkeys.

[0232] An additional embodiment of the invention relates to theadministration of a composition which generally comprises an activeingredient formulated with a pharmaceutically acceptable excipient.Excipients may include, for example, sugars, starches, celluloses, gums,and proteins. Various formulations are commonly known and are thoroughlydiscussed in the latest edition of Remington's Pharmaceutical Sciences(Maack Publishing, Easton Pa.). Such compositions may consist of PRTS,antibodies to PRTS, and mimetics, agonists, antagonists, or inhibitorsof PRTS.

[0233] The compositions utilized in this invention may be administeredby any number of routes including, but not limited to, oral,intravenous, intramuscular, intra-arterial, intramedullary, intrathecal,intraventricular, pulmonary, transdermal, subcutaneous, intraperitoneal,intranasal, enteral, topical, sublingual, or rectal means.

[0234] Compositions for pulmonary administration may be prepared inliquid or dry powder form. These compositions are generally aerosolizedimmediately prior to inhalation by the patient. In the case of smallmolecules (e.g. traditional low molecular weight organic drugs), aerosoldelivery of fast-acting formulations is well-known in the art. In thecase of macromolecules (e.g. larger peptides and proteins), recentdevelopments in the field of pulmonary delivery via the alveolar regionof the lung have enabled the practical delivery of drugs such as insulinto blood circulation (see, e.g., Patton, J. S. et al., U.S. Pat. No.5,997,848). Pulmonary delivery has the advantage of administrationwithout needle injection, and obviates the need for potentially toxicpenetration enhancers.

[0235] Compositions suitable for use in the invention includecompositions wherein the active ingredients are contained in aneffective amount to achieve the intended purpose. The determination ofan effective dose is well within the capability of those skilled in theart.

[0236] Specialized forms of compositions may be prepared for directintracellular delivery of macromolecules comprising PRTS or fragmentsthereof. For example, liposome preparations containing acell-impermeable macromolecule may promote cell fusion and intracellulardelivery of the macromolecule. Alternatively, PRTS or a fragment thereofmay be joined to a short cationic N-terminal portion from the HIV Tat-iprotein. Fusion proteins thus generated have been found to transduceinto the cells of all tissues, including the brain, in a mouse modelsystem (Schwarze, S. R. et al. (1999) Science 285:1569-1572).

[0237] For any compound, the therapeutically effective dose can beestimated initially either in cell culture assays, e.g., of neoplasticcells, or in animal models such as mice, rats, rabbits, dogs, monkeys,or pigs. An animal model may also be used to determine the appropriateconcentration range and route of administration. Such information canthen be used to determine useful doses and routes for administration inhumans.

[0238] A therapeutically effective dose refers to that amount of activeingredient, for example PRTS or fragments thereof, antibodies of PRTS,and agonists, antagonists or inhibitors of PRTS, which ameliorates thesymptoms or condition. Therapeutic efficacy and toxicity may bedetermined by standard pharmaceutical procedures in cell cultures orwith experimental animals, such as by calculating the ED₅₀ (the dosetherapeutically effective in 50% of the population) or LD₅₀ (the doselethal to 50% of the population) statistics. The dose ratio of toxic totherapeutic effects is the therapeutic index, which can be expressed asthe LD₅₀/ED₅₀ ratio. Compositions which exhibit large therapeuticindices are preferred. The data obtained from cell culture assays andanimal studies are used to formulate a range of dosage for human use.The dosage contained in such compositions is preferably within a rangeof circulating concentrations that includes the ED₅₀ with little or notoxicity. The dosage varies within this range depending upon the dosageform employed, the sensitivity of the patient, and the route ofadministration.

[0239] The exact dosage will be determined by the practitioner, in lightof factors related to the subject requiring treatment. Dosage andadministration are adjusted to provide sufficient levels of the activemoiety or to maintain the desired effect. Factors which may be takeninto account include the severity of the disease state, the generalhealth of the subject, the age, weight, and gender of the subject, timeand frequency of administration, drug combination(s), reactionsensitivities, and response to therapy. Long-acting compositions may beadministered every 3 to 4 days, every week, or biweekly depending on thehalf-life and clearance rate of the particular formulation.

[0240] Normal dosage amounts may vary from about 0.1 μg to 100,000 μg,up to a total dose of about 1 gram, depending upon the route ofadministration. Guidance as to particular dosages and methods ofdelivery is provided in the literature and generally available topractitioners in the art. Those skilled in the art will employ differentformulations for nucleotides than for proteins or their inhibitors.Similarly, delivery of polynucleotides or polypeptides will be specificto particular cells, conditions, locations, etc.

[0241] Diagnostics

[0242] In another embodiment, antibodies which specifically bind PRTSmay be used for the diagnosis of disorders characterized by expressionof PRTS, or in assays to monitor patients being treated with PRTS oragonists, antagonists, or inhibitors of PRTS. Antibodies useful fordiagnostic purposes may be prepared in the same manner as describedabove for therapeutics. Diagnostic assays for PRTS include methods whichutilize the antibody and a label to detect PRTS in human body fluids orin extracts of cells or tissues. The antibodies may be used with orwithout modification, and may be labeled by covalent or non-covalentattachment of a reporter molecule. A wide variety of reporter molecules,several of which are described above, are known in the art and may beused.

[0243] A variety of protocols for measuring PRTS, including ELISAs,RIAs, and FACS, are known in the art and provide a basis for diagnosingaltered or abnormal levels of PRTS expression. Normal or standard valuesfor PRTS expression are established by combining body fluids or cellextracts taken from normal mammalian subjects, for example, humansubjects, with antibodies to PRTS under conditions suitable for complexformation. The amount of standard complex formation may be quantitatedby various methods, such as photometric means. Quantities of PRTSexpressed in subject, control, and disease samples from biopsied tissuesare compared with the standard values. Deviation between standard andsubject values establishes the parameters for diagnosing disease.

[0244] In another embodiment of the invention, the polynucleotidesencoding PRTS may be used for diagnostic purposes. The polynucleotideswhich may be used include oligonucleotide sequences, complementary RNAand DNA molecules, and PNAs. The polynucleotides may be used to detectand quantify gene expression in biopsied tissues in which expression ofPRTS may be correlated with disease. The diagnostic assay may be used todetermine absence, presence, and excess expression of PRTS, and tomonitor regulation of PRTS levels during therapeutic intervention.

[0245] In one aspect, hybridization with PCR probes which are capable ofdetecting polynucleotide sequences, including genomic sequences,encoding PRTS or closely related molecules may be used to identifynucleic acid sequences which encode PRTS. The specificity of the probe,whether it is made from a highly specific region, e.g., the 5′regulatory region, or from a less specific region, e.g., a conservedmotif, and the stringency of the hybridization or amplification willdetermine whether the probe identifies only naturally occurringsequences encoding PRTS, allelic variants, or related sequences.

[0246] Probes may also be used for the detection of related sequences,and may have at least 50% sequence identity to any of the PRTS encodingsequences. The hybridization probes of the subject invention may be DNAor RNA and may be derived from the sequence of SEQ ID NO:22-42 or fromgenomic sequences including promoters, enhancers, and introns of thePRTS gene.

[0247] Means for producing specific hybridization probes for DNAsencoding PRTS include the cloning of polynucleotide sequences encodingPRTS or PRTS derivatives into vectors for the production of mRNA probes.Such vectors are known in the art, are commercially available, and maybe used to synthesize RNA probes in vitro by means of the addition ofthe appropriate RNA polymerases and the appropriate labeled nucleotides.Hybridization probes may be labeled by a variety of reporter groups, forexample, by radionuclides such as ³²P or ³⁵S, or by enzymatic labels,such as alkaline phosphatase coupled to the probe via avidin/biotincoupling systems, and the like.

[0248] Polynucleotide sequences encoding PRTS may be used for thediagnosis of disorders associated with expression of PRTS. Examples ofsuch disorders include, but are not limited to, a gastrointestinaldisorder, such as dysphagia, peptic esophagitis, esophageal spasm,esophageal stricture, esophageal carcinoma, dyspepsia, indigestion,gastritis, gastric carcinoma, anorexia, nausea, emesis, gastroparesis,antral or pyloric edema, abdominal angina, pyrosis, gastroenteritis,intestinal obstruction, infections of the intestinal tract, pepticulcer, cholelithiasis, cholecystitis, cholestasis, pancreatitis,pancreatic carcinoma, biliary tract disease, hepatitis,hyperbilirubinemia, cirrhosis, passive congestion of the liver,hepatoma, infectious colitis, ulcerative colitis, ulcerative proctitis,Crohn's disease, Whipple's disease, Mallory-Weiss syndrome, coloniccarcinoma, colonic obstruction, irritable bowel syndrome, short bowelsyndrome, diarrhea, constipation, gastrointestinal hemorrhage, acquiredimmunodeficiency syndrome (AIDS) enteropathy, jaundice, hepaticencephalopathy, hepatorenal syndrome, hepatic steatosis,hemochromatosis, Wilson's disease, alpha₁-antitrypsin deficiency, Reye'ssyndrome, primary sclerosing cholangitis, liver infarction, portal veinobstruction and thrombosis, centrilobular necrosis, peliosis hepatis,hepatic vein thrombosis, veno-occlusive disease, preeclampsia,eclampsia, acute fatty liver of pregnancy, intrahepatic cholestasis ofpregnancy, and hepatic tumors including nodular hyperplasias, adenomas,and carcinomas; a cardiovascular disorder, such as arterioyenousfistula, atherosclerosis, hypertension, vasculitis, Raynaud's disease,aneurysms, arterial dissections, varicose veins, thrombophlebitis andphlebothrombosis, vascular tumors, and complications of thrombolysis,balloon angioplasty, vascular replacement, and coronary artery bypassgraft surgery, congestive heart failure, ischemic heart disease, anginapectoris, myocardial infarction, hypertensive heart disease,degenerative valvular heart disease, calcific aortic valve stenosis,congenitally bicuspid aortic valve, mitral annular calcification, mitralvalve prolapse, rheumatic fever and rheumatic heart disease, infectiveendocarditis, nonbacterial thrombotic endocarditis, endocarditis ofsystemic lupus erythematosus, carcinoid heart disease, cardiomyopathy,myocarditis, pericarditis, neoplastic heart disease, congenital heartdisease, and complications of cardiac transplantation; anautoimmune/inflammatory disorder, such as acquired immunodeficiencysyndrome (AIDS), Addison's disease, adult respiratory distress syndrome,allergies, ankylosing spondylitis, amyloidosis, anemia, asthma,atherosclerosis, atherosclerotic plaque rupture, autoimmune hemolyticanemia, autoimmune thyroiditis, autoimmunepolyendocrinopathy-candidiasis-ectodermal dystrophy (APECED),bronchitis, cholecystitis, contact dermatitis, Crohn's disease, atopicdermatitis, dermatomyositis, diabetes mellitus, emphysema, episodiclymphopenia with lymphocytotoxins, erythroblastosis fetalis, erythemanodosum, atrophic gastritis, glomerulonephritis, Goodpasture's syndrome,gout, Graves' disease, Hashimoto's thyroiditis, hypereosinophilia,irritable bowel syndrome, multiple sclerosis, myasthenia gravis,myocardial or pericardial inflammation, osteoarthritis, degradation ofarticular cartilage, osteoporosis, pancreatitis, polymyositis,psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma,Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus,systemic sclerosis, thrombocytopenic purpura, ulcerative colitis,uveitis, Werner syndrome, complications of cancer, hemodialysis, andextracorporeal circulation, viral, bacterial, fungal, parasitic,protozoal, and helminthic infections, and trauma; a cell proliferativedisorder such as actinic keratosis, arteriosclerosis, atherosclerosis,bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD),myelofibrosis, paroxysmal nocturnal hemoglobinuria, polycythemia vera,psoriasis, primary thrombocythemia, and cancers includingadenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma,teratocarcinoma, and, in particular, cancers of the adrenal gland,bladder, bone, bone marrow, brain, breast, cervix, gall bladder,ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle,ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin,spleen, testis, thymus, thyroid, and uterus; a developmental disorder,such as renal tubular acidosis, anemia, Cushing's syndrome,achondroplastic dwarfism, Duchenne and Becker muscular dystrophy, boneresorption, epilepsy, gonadal dysgenesis, WAGR syndrome (Wilms' tumor,aniridia, genitourinary abnormalities, and mental retardation),Smith-Magenis syndrome, myelodysplastic syndrome, hereditarymucoepithelial dysplasia, hereditary keratodermas, hereditaryneuropathies such as Charcot-Marie-Tooth disease and neurofibromatosis,hypothyroidism, hydrocephalus, seizure disorders such as Syndenham'schorea and cerebral palsy, spina bifida, anencephaly,craniorachischisis, congenital glaucoma, cataract, age-related maculardegeneration, and sensorineural hearing loss; an epithelial disorder,such as dyshidrotic eczema, allergic contact dermatitis, keratosispilaris, melasma, vitiligo, actinic keratosis, basal cell carcinoma,squamous cell carcinoma, seborrheic keratosis, folliculitis, herpessimplex, herpes zoster, varicella, candidiasis, dermatophytosis,scabies, insect bites, cherry angioma, keloid, dermatofibroma,acrochordons, urticaria, transient acantholytic dermatosis, xerosis,eczema, atopic dermatitis, contact dermatitis, hand eczema, nummulareczema, lichen simplex chronicus, asteatotic eczema, stasis dermatitisand stasis ulceration, seborrheic dermatitis, psoriasis, lichen planus,pityriasis rosea, impetigo, ecthyma, dermatophytosis, tinea versicolor,warts, acne vulgaris, acne rosacea, pemphigus vulgaris, pemphigusfoliaceus, paraneoplastic pemphigus, bullous pemphigoid, herpesgestationis, derrnatitis herpetiformis, linear IgA disease,epidermolysis bullosa acquisita, dermatomyositis, lupus erythematosus,scleroderma and morphea, erythroderma, alopecia, figurate skin lesions,telangiectasias, hypopigmentation, hyperpigmentation, vesicles/bullae,exanthems, cutaneous drug reactions, papulonodular skin lesions, chronicnon-healing wounds, photosensitivity diseases, epidermolysis bullosasimplex, epidermolytic hyperkeratosis, epidermolytic andnonepidermolytic palmoplantar keratoderma, ichthyosis bullosa ofSiemens, ichthyosis exfoliativa, keratosis palmaris et plantaris,keratosis palmoplantaris, palmoplantar keratoderma, keratosis punctata,Meesmann's corneal dystrophy, pachyonychia congenita, white spongenevus, steatocystoma multiplex, epidermal nevi/epidermolytichyperkeratosis type, monilethrix, trichothiodystrophy, chronichepatitis/cryptogenic cirrhosis, and colorectal hyperplasia; aneurological disorder, such as epilepsy, ischemic cerebrovasculardisease, stroke, cerebral neoplasms, Alzheimer's disease, Pick'sdisease, Huntington's disease, dementia, Parkinson's disease and otherextrapyramidal disorders, amyotrophic lateral sclerosis and other motorneuron disorders, progressive neural muscular atrophy, retinitispigmentosa, hereditary ataxias, multiple sclerosis and otherdemyelinating diseases, bacterial and viral meningitis, brain abscess,subdural empyema, epidural abscess, suppurative intracranialthrombophlebitis, myelitis and radiculitis, viral central nervous systemdisease, prion diseases including kuru, Creutzfeldt-Jakob disease, andGerstmann-Straussler-Scheinker syndrome, fatal familial insomnia,nutritional and metabolic diseases of the nervous system,neurofibromatosis, tuberous sclerosis, cerebelloretinalhemangioblastomatosis, encephalotrigeminal syndrome, mental retardationand other developmental disorders of the central nervous systemincluding Down syndrome, cerebral palsy, neuroskeletal disorders,autonomic nervous system disorders, cranial nerve disorders, spinal corddiseases, muscular dystrophy and other neuromuscular disorders,peripheral nervous system disorders, dermatomyositis and polymyositis,inherited, metabolic, endocrine, and toxic myopathies, myastheniagravis, periodic paralysis, mental disorders including mood, anxiety,and schizophrenic disorders, seasonal affective disorder (SAD),akathesia, amnesia, catatonia, diabetic neuropathy, tardive dyskinesia,dystonias, paranoid psychoses, postherpetic neuralgia, Tourette'sdisorder, progressive supranuclear palsy, corticobasal degeneration, andfamilial frontotemporal dementia; and a reproductive disorder, such asinfertility, including tubal disease, ovulatory defects, andendometriosis, a disorder of prolactin production, a disruption of theestrous cycle, a disruption of the menstrual cycle, polycystic ovarysyndrome, ovarian hyperstimulation syndrome, an endometrial or ovariantumor, a uterine fibroid, autoimmune disorders, an ectopic pregnancy,and teratogenesis; cancer of the breast, fibrocystic breast disease, andgalactorrhea; a disruption of spermatogenesis, abnormal spermphysiology, cancer of the testis, cancer of the prostate, benignprostatic hyperplasia, prostatitis, Peyronie's disease, impotence,carcinoma of the male breast, and gynecomastia. The polynucleotidesequences encoding PRTS may be used in Southern or northern analysis,dot blot, or other membrane-based technologies; in PCR technologies; indipstick, pin, and multiformat ELISA-like assays; and in microarraysutilizing fluids or tissues from patients to detect altered PRTSexpression. Such qualitative or quantitative methods are well known inthe art.

[0249] In a particular aspect, the nucleotide sequences encoding PRTSmay be useful in assays that detect the presence of associateddisorders, particularly those mentioned above. The nucleotide sequencesencoding PRTS may be labeled by standard methods and added to a fluid ortissue sample from a patient under conditions suitable for the formationof hybridization complexes. After a suitable incubation period, thesample is washed and the signal is quantified and compared with astandard value. If the amount of signal in the patient sample issignificantly altered in comparison to a control sample then thepresence of altered levels of nucleotide sequences encoding PRTS in thesample indicates the presence of the associated disorder. Such assaysmay also be used to evaluate the efficacy of a particular therapeutictreatment regimen in animal studies, in clinical trials, or to monitorthe treatment of an individual patient.

[0250] In order to provide a basis for the diagnosis of a disorderassociated with expression of PRTS, a normal or standard profile forexpression is established. This may be accomplished by combining bodyfluids or cell extracts taken from normal subjects, either animal orhuman, with a sequence, or a fragment thereof, encoding PRTS, underconditions suitable for hybridization or amplification. Standardhybridization may be quantified by comparing the values obtained fromnormal subjects with values from an experiment in which a known amountof a substantially purified polynucleotide is used. Standard valuesobtained in this manner may be compared with values obtained fromsamples from patients who are symptomatic for a disorder. Deviation fromstandard values is used to establish the presence of a disorder.

[0251] Once the presence of a disorder is established and a treatmentprotocol is initiated, hybridization assays may be repeated on a regularbasis to determine if the level of expression in the patient begins toapproximate that which is observed in the normal subject. The resultsobtained from successive assays may be used to show the efficacy oftreatment over a period ranging from several days to months.

[0252] With respect to cancer, the presence of an abnormal amount oftranscript (either under- or overexpressed) in biopsied tissue from anindividual may indicate a predisposition for the development of thedisease, or may provide a means for detecting the disease prior to theappearance of actual clinical symptoms. A more definitive diagnosis ofthis type may allow health professionals to employ preventative measuresor aggressive treatment earlier thereby preventing the development orfurther progression of the cancer.

[0253] Additional diagnostic uses for oligonucleotides designed from thesequences encoding PRTS may involve the use of PCR. These oligomers maybe chemically synthesized, generated enzymatically, or produced invitro. Oligomers will preferably contain a fragment of a polynucleotideencoding PRTS, or a fragment of a polynucleotide complementary to thepolynucleotide encoding PRTS, and will be employed under optimizedconditions for identification of a specific gene or condition. Oligomersmay also be employed under less stringent conditions for detection orquantification of closely related DNA or RNA sequences.

[0254] In a particular aspect, oligonucleotide primers derived from thepolynucleotide sequences encoding PRTS may be used to detect singlenucleotide polymorphisms (SNPs). SNPs are substitutions, insertions anddeletions that are a frequent cause of inherited or acquired geneticdisease in humans. Methods of SNP detection include, but are notlirnited to, single-stranded conformation polymorphism (SSCP) andfluorescent SSCP (fSSCP) methods. In SSCP, oligonucleotide primersderived from the polynucleotide sequences encoding PRTS are used toamplify DNA using the polymerase chain reaction (PCR). The DNA may bederived, for example, from diseased or normal tissue, biopsy samples,bodily fluids, and the like. SNPs in the DNA cause differences in thesecondary and tertiary structures of PCR products in single-strandedform, and these differences are detectable using gel electrophoresis innon-denaturing gels. In fSCCP, the oligonucleotide primers arefluorescently labeled, which allows detection of the amplimers inhigh-throughput equipment such as DNA sequencing machines. Additionally,sequence database analysis methods, termed in silico SNP (is SNP), arecapable of identifying polymorphisms by comparing the sequence ofindividual overlapping DNA fragments which assemble into a commonconsensus sequence. These computer-based methods filter out sequencevariations due to laboratory preparation of DNA and sequencing errorsusing statistical models and automated analyses of DNA sequencechromatograms. In the alternative, SNPs may be detected andcharacterized by mass spectrometry using, for example, the highthroughput MASSARRAY system (Sequenom, Inc., San Diego Calif.).

[0255] Methods which may also be used to quantify the expression of PRTSinclude radiolabeling or biotinylating nucleotides, coamplification of acontrol nucleic acid, and interpolating results from standard curves.(See, e.g., Melby, P. C. et al. (1993) J. Immunol. Methods 159:235-244;Duplaa, C. et al. (1993) Anal. Biochem. 212:229-236.) The speed ofquantitation of multiple samples may be accelerated by running the assayin a high-throughput format where the oligomer or polynucleotide ofinterest is presented in various dilutions and a spectrophotometric orcolorimetric response gives rapid quantitation.

[0256] In further embodiments, oligonucleotides or longer fragmentsderived from any of the polynucleotide sequences described herein may beused as elements on a microarray. The microarray can be used intranscript imaging techniques which monitor the relative expressionlevels of large numbers of genes simultaneously as described below. Themicroarray may also be used to identify genetic variants, mutations, andpolymorphisms. This information may be used to determine gene function,to understand the genetic basis of a disorder, to diagnose a disorder,to monitor progression/regression of disease as a function of geneexpression, and to develop and monitor the activities of therapeuticagents in the treatment of disease. In particular, this information maybe used to develop a pharmacogenomic profile of a patient in order toselect the most appropriate and effective treatment regimen for thatpatient. For example, therapeutic agents which are highly effective anddisplay the fewest side effects may be selected for a patient based onhis/her pharmacogenomic profile.

[0257] In another embodiment, PRTS, fragments of PRTS, or antibodiesspecific for PRTS may be used as elements on a microarray. Themicroarray may be used to monitor or measure protein-proteininteractions, drug-target interactions, and gene expression profiles, asdescribed above.

[0258] A particular embodiment relates to the use of the polynucleotidesof the present invention to generate a transcript image of a tissue orcell type. A transcript image represents the global pattern of geneexpression by a particular tissue or cell type. Global gene expressionpatterns are analyzed by quantifying the number of expressed genes andtheir relative abundance under given conditions and at a given time.(See Seilhamer et al., “Comparative Gene Transcript Analysis,” U.S. Pat.No. 5,840,484, expressly incorporated by reference herein.) Thus atranscript image may be generated by hybridizing the polynucleotides ofthe present invention or their complements to the totality oftranscripts or reverse transcripts of a particular tissue or cell type.In one embodiment, the hybridization takes place in high-throughputformat, wherein the polynucleotides of the present invention or theircomplements comprise a subset of a plurality of elements on amicroarray. The resultant transcript image would provide a profile ofgene activity.

[0259] Transcript images may be generated using transcripts isolatedfrom tissues, cell lines, biopsies, or other biological samples. Thetranscript image may thus reflect gene expression in vivo, as in thecase of a tissue or biopsy sample, or in vitro, as in the case of a cellline.

[0260] Transcript images which profile the expression of thepolynucleotides of the present invention may also be used in conjunctionwith in vitro model systems and preclinical evaluation ofpharmaceuticals, as well as toxicological testing of industrial andnaturally-occurring environmental compounds. All compounds inducecharacteristic gene expression patterns, frequently termed molecularfingerprints or toxicant signatures, which are indicative of mechanismsof action and toxicity (Nuwaysir, E. F. et al. (1999) Mol. Carcinog.24:153-159; Steiner, S. and N.L. Anderson (2000) Toxicol. Lett.112-113:467-471, expressly incorporated by reference herein). If a testcompound has a signature similar to that of a compound with knowntoxicity, it is likely to share those toxic properties. Thesefingerprints or signatures are most useful and refined when they containexpression information from a large number of genes and gene families.Ideally, a genome-wide measurement of expression provides the highestquality signature. Even genes whose expression is not altered by anytested compounds are important as well, as the levels of expression ofthese genes are used to normalize the rest of the expression data. Thenormalization procedure is useful for comparison of expression dataafter treatment with different compounds. While the assignment of genefunction to elements of a toxicant signature aids in interpretation oftoxicity mechanisms, knowledge of gene function is not necessary for thestatistical matching of signatures which leads to prediction oftoxicity. (See, for example, Press Release 00-02 from the NationalInstitute of Environmental Health Sciences, released Feb. 29, 2000,available at http://www.niehs.nih.gov/oc/news/toxchip.htm.) Therefore,it is important and desirable in toxicological screening using toxicantsignatures to include all expressed gene sequences.

[0261] In one embodiment, the toxicity of a test compound is assessed bytreating a biological sample containing nucleic acids with the testcompound. Nucleic acids that are expressed in the treated biologicalsample are hybridized with one or more probes specific to thepolynucleotides of the present invention, so that transcript levelscorresponding to the polynucleotides of the present invention may bequantified. The transcript levels in the treated biological sample arecompared with levels in an untreated biological sample. Differences inthe transcript levels between the two samples are indicative of a toxicresponse caused by the test compound in the treated sample.

[0262] Another particular embodiment relates to the use of thepolypeptide sequences of the present invention to analyze the proteomeof a tissue or cell type. The term proteome refers to the global patternof protein expression in a particular tissue or cell type. Each proteincomponent of a proteome can be subjected individually to furtheranalysis. Proteome expression patterns, or profiles, are analyzed byquantifying the number of expressed proteins and their relativeabundance under given conditions and at a given time. A profile of acell's proteome may thus be generated by separating and analyzing thepolypeptides of a particular tissue or cell type. In one embodiment, theseparation is achieved using two-dimensional gel electrophoresis, inwhich proteins from a sample are separated by isoelectric focusing inthe first dimension, and then according to molecular weight by sodiumdodecyl sulfate slab gel electrophoresis in the second dimension(Steiner and Anderson, supra). The proteins are visualized in the gel asdiscrete and uniquely positioned spots, typically by staining the gelwith an agent such as Coomassie Blue or silver or fluorescent stains.The optical density of each protein spot is generally proportional tothe level of the protein in the sample. The optical densities ofequivalently positioned protein spots from different samples, forexample, from biological samples either treated or untreated with a testcompound or therapeutic agent, are compared to identify any changes inprotein spot density related to the treatment. The proteins in the spotsare partially sequenced using, for example, standard methods employingchemical or enzymatic cleavage followed by mass spectrometry. Theidentity of the protein in a spot may be determined by comparing itspartial sequence, preferably of at least 5 contiguous amino acidresidues, to the polypeptide sequences of the present invention. In somecases, further sequence data may be obtained for definitive proteinidentification.

[0263] A proteomic profile may also be generated using antibodiesspecific for PRTS to quantify the levels of PRTS expression. In oneembodiment, the antibodies are used as elements on a microarray, andprotein expression levels are quantified by exposing the microarray tothe sample and detecting the levels of protein bound to each arrayelement (Lueking, A. et al. (1999) Anal. Biochem. 270:103-111; Mendoze,L. G. et al. (1999) Biotechniques 27:778-788). Detection may beperformed by a variety of methods known in the art, for example, byreacting the proteins in the sample with a thiol- or amino-reactivefluorescent compound and detecting the amount of fluorescence bound ateach array element.

[0264] Toxicant signatures at the proteome level are also useful fortoxicological screening, and should be analyzed in parallel withtoxicant signatures at the transcript level. There is a poor correlationbetween transcript and protein abundances for some proteins in sometissues (Anderson, N. L. and J. Seilhamer (1997) Electrophoresis18:533-537), so proteome toxicant signatures may be useful in theanalysis of compounds which do not significantly affect the transcriptimage, but which alter the proteomic profile. In addition, the analysisof transcripts in body fluids is difficult, due to rapid degradation ofmRNA, so proteomic profiling may be more reliable and informative insuch cases.

[0265] In another embodiment, the toxicity of a test compound isassessed by treating a biological sample containing proteins with thetest compound. Proteins that are expressed in the treated biologicalsample are separated so that the amount of each protein can bequantified. The amount of each protein is compared to the amount of thecorresponding protein in an untreated biological sample. A difference inthe amount of protein between the two samples is indicative of a toxicresponse to the test compound in the treated sample. Individual proteinsare identified by sequencing the amino acid residues of the individualproteins and comparing these partial sequences to the polypeptides ofthe present invention.

[0266] In another embodiment, the toxicity of a test compound isassessed by treating a biological sample containing proteins with thetest compound. Proteins from the biological sample are incubated withantibodies specific to the polypeptides of the present invention. Theamount of protein recognized by the antibodies is quantified. The amountof protein in the treated biological sample is compared with the amountin an untreated biological sample. A difference in the amount of proteinbetween the two samples is indicative of a toxic response to the testcompound in the treated sample.

[0267] Microarrays may be prepared, used, and analyzed using methodsknown in the art. (See, e.g., Brennan, T. M. et al. (1995) U.S. Pat. No.5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA93:10614-10619; Baldeschweiler et al. (1995) PCT applicationWO95/251116; Shalon, D. et al. (1995) PCT application WO95/35505;Heller, R. A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; andHeller, M. J. et al. (1997) U.S. Pat. No. 5,605,662.) Various types ofmicroarrays are well known and thoroughly described in DNA Microarrays:A Practical Approach, M. Schena, ed. (1999) Oxford University Press,London, hereby expressly incorporated by reference.

[0268] In another embodiment of the invention, nucleic acid sequencesencoding PRTS may be used to generate hybridization probes useful inmapping the naturally occurring genomic sequence. Either coding ornoncoding sequences may be used, and in some instances, noncodingsequences may be preferable over coding sequences. For example,conservation of a coding sequence among members of a multi-gene familymay potentially cause undesired cross hybridization during chromosomalmapping. The sequences may be mapped to a particular chromosome, to aspecific region of a chromosome, or to artificial chromosomeconstructions, e.g., human artificial chromosomes (HACs), yeastartificial chromosomes (YACs), bacterial artificial chromosomes (BACs),bacterial P1 constructions, or single chromosome cDNA libraries. (See,e.g., Harrington, J. J. et al. (1997) Nat. Genet. 15:345-355; Price, C.M. (1993) Blood Rev. 7:127-134; and Trask, B. J. (1991) Trends Genet.7:149-154.) Once mapped, the nucleic acid sequences of the invention maybe used to develop genetic linkage maps, for example, which correlatethe inheritance of a disease state with the inheritance of a particularchromosome region or restriction fragment length polymorphism (RFLP).(See, for example, Lander, E. S. and D. Botstein (1986) Proc. Natl.Acad. Sci. USA 83:7353-7357.)

[0269] Fluorescent in situ hybridization (FISH) may be correlated withother physical and genetic map data. (See, e.g., Heinz-Ulrich, et al.(1995) in Meyers, supra, pp. 965-968.) Examples of genetic map data canbe found in various scientific journals or at the Online MendelianInheritance in Man (OMIM) World Wide Web site. Correlation between thelocation of the gene encoding PRTS on a physical map and a specificdisorder, or a predisposition to a specific disorder, may help definethe region of DNA associated with that disorder and thus may furtherpositional cloning efforts.

[0270] In situ hybridization of chromosomal preparations and physicalmapping techniques, such as linkage analysis using establishedchromosomal markers, may be used for extending genetic maps. Often theplacement of a gene on the chromosome of another mammalian species, suchas mouse, may reveal associated markers even if the exact chromosomallocus is not known. This information is valuable to investigatorssearching for disease genes using positional cloning or other genediscovery techniques. Once the gene or genes responsible for a diseaseor syndrome have been crudely localized by genetic linkage to aparticular genomic region, e.g., ataxia-telangiectasia to 11q22-23, anysequences mapping to that area may represent associated or regulatorygenes for further investigation. (See, e.g., Gatti, R. A. et al. (1988)Nature 336:577-580.) The nucleotide sequence of the instant inventionmay also be used to detect differences in the chromosomal location dueto translocation, inversion, etc., among normal, carrier, or affectedindividuals.

[0271] In another embodiment of the invention, PRTS, its catalytic orimmunogenic fragments, or oligopeptides thereof can be used forscreening libraries of compounds in any of a variety of drug screeningtechniques. The fragment employed in such screening may be free insolution, affixed to a solid support, borne on a cell surface, orlocated intracellularly. The formation of binding complexes between PRTSand the agent being tested may be measured.

[0272] Another technique for drug screening provides for high throughputscreening of compounds having suitable binding affinity to the proteinof interest. (See, e.g., Geysen, et al. (1984) PCT applicationWO84/03564.) In this method, large numbers of different small testcompounds are synthesized on a solid substrate. The test compounds arereacted with PRTS, or fragments thereof, and washed. Bound PRTS is thendetected by methods well known in the art. Purified PRTS can also becoated directly onto plates for use in the aforementioned drug screeningtechniques. Alternatively, non-neutralizing antibodies can be used tocapture the peptide and immobilize it on a solid support.

[0273] In another embodiment, one may use competitive drug screeningassays in which neutralizing antibodies capable of binding PRTSspecifically compete with a test compound for binding PRTS. In thismanner, antibodies can be used to detect the presence of any peptidewhich shares one or more antigenic determinants with PRTS.

[0274] In additional embodiments, the nucleotide sequences which encodePRTS may be used in any molecular biology techniques that have yet to bedeveloped, provided the new techniques rely on properties of nucleotidesequences that are currently known, including, but not limited to, suchproperties as the triplet genetic code and specific base pairinteractions.

[0275] Without further elaboration, it is believed that one skilled inthe art can, using the preceding description, utilize the presentinvention to its fullest extent. The following embodiments are,therefore, to be construed as merely illustrative, and not limitative ofthe remainder of the disclosure in any way whatsoever.

[0276] The disclosures of all patents, applications and publications,mentioned above and below, including U.S. Ser. No. 60/220,063, U.S. Ser.No. 60/221,680, U.S. Ser. No. 60/223,544, U.S. Ser. No. 60/224,717, U.S.Ser. No. 60/225,988, and U.S. Ser. No. 60/227,568 are expresslyincorporated by reference herein.

EXAMPLES

[0277] I. Construction of cDNA Libraries

[0278] Incyte cDNAs were derived from cDNA libraries described in theLIFESEQ GOLD database (Incyte Genomics, Palo Alto Calif.) and shown inTable 4, column 5. Some tissues were homogenized and lysed inguanidinium isothiocyanate, while others were homogenized and lysed inphenol or in a suitable mixture of denaturants, such as TRIZOL (LifeTechnologies), a monophasic solution of phenol and guanidineisothiocyanate. The resulting lysates were centrifuged over CsClcushions or extracted with chloroform. RNA was precipitated from thelysates with either isopropanol or sodium acetate and ethanol, or byother routine methods.

[0279] Phenol extraction and precipitation of RNA were repeated asnecessary to increase RNA purity. In some cases, RNA was treated withDNase. For most libraries, poly(A)+ RNA was isolated using oligod(T)-coupled paramagnetic particles (Promega), OLIGOTEX latex particles(QIAGEN, Chatsworth Calif.), or an OLIGOTEX mRNA purification kit(QIAGEN). Alternatively, RNA was isolated directly from tissue lysatesusing other RNA isolation kits, e.g., the POLY(A)PURE mRNA purificationkit (Ambion, Austin Tex.).

[0280] In some cases, Stratagene was provided with RNA and constructedthe corresponding cDNA libraries. Otherwise, cDNA was synthesized andcDNA libraries were constructed with the UNIZAP vector system(Stratagene) or SUPERSCRIPT plasmid system (Life Technologies), usingthe recommended procedures or similar methods known in the art. (See,e.g., Ausubel, 1997, supra, units 5.1-6.6.) Reverse transcription wasinitiated using oligo d(T) or random primers. Synthetic oligonucleotideadapters were ligated to double stranded cDNA, and the cDNA was digestedwith the appropriate restriction enzyme or enzymes. For most libraries,the cDNA was size-selected (300-1000 bp) using SEPHACRYL S1000,SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (AmershamPharmacia Biotech) or preparative agarose gel electrophoresis. cDNAswere ligated into compatible restriction enzyme sites of the polylinkerof a suitable plasmid, e.g., PBLUESCRIPT plasmid (Stratagene), PSPORTIplasmid (Life Technologies), PcDNA2.1 plasmid (Invitrogen, CarlsbadCalif.), PBK-CMV plasmid (Stratagene), or pINCY (Incyte Genomics, PaloAlto Calif.), or derivatives thereof. Recombinant plasmids weretransformed into competent E. coli cells including XL1-Blue,XL1-BlueMRF, or SOLR from Stratagene or DH5a, DH10B, or ElectroMAX DH10Bfrom Life Technologies.

[0281] II. Isolation of cDNA Clones

[0282] Plasmids obtained as described in Example I were recovered fromhost cells by in vivo excision using the UNIZAP vector system(Stratagene) or by cell lysis. Plasmids were purified using at least oneof the following: a Magic or WIZARD Minipreps DNA purification system(Promega); an AGTC Miniprep purification kit (Edge Biosystems,Gaithersburg Md.); and QIAWELL 8 Plasmid, QIAWELL 8 Plus Plasmid,QIAWELL 8 Ultra Plasmid purification systems or the R.E.A.L. PREP 96plasmid purification kit from QIAGEN. Following precipitation, plasmidswere resuspended in 0.1 ml of distilled water and stored, with orwithout lyophilization, at 4° C.

[0283] Alternatively, plasmid DNA was amplified from host cell lysatesusing direct link PCR in a high-throughput format (Rao, V. B. (1994)Anal. Biochem. 216:1-14). Host cell lysis and thermal cycling steps werecarried out in a single reaction mixture. Samples were processed andstored in 384-well plates, and the concentration of amplified plasmidDNA was quantified fluorometrically using PICOGREEN dye (MolecularProbes, Eugene Oreg.) and a FLUOROSKAN II fluorescence scanner(Labsystems Oy, Helsinki, Finland).

[0284] III. Sequencing and Analysis

[0285] Incyte cDNA recovered in plasmids as described in Example II weresequenced as follows. Sequencing reactions were processed using standardmethods or high-throughput instrumentation such as the ABI CATALYST 800(Applied Biosystems) thermal cycler or the PTC-200 thermal cycler (MJResearch) in conjunction with the HYDRA microdispenser (RobbinsScientific) or the MICROLAB 2200 (Hamilton) liquid transfer system. cDNAsequencing reactions were prepared using reagents provided by AmershamPharmacia Biotech or supplied in ABI sequencing kits such as the ABIPRISM BIGDYE Terminator cycle sequencing ready reaction kit (AppliedBiosystems). Electrophoretic separation of cDNA sequencing reactions anddetection of labeled polynucleotides were carried out using the MEGABACE1000 DNA sequencing system (Molecular Dynamics); the ABI PRISM 373 or377 sequencing system (Applied Biosystems) in conjunction with standardABI protocols and base calling software; or other sequence analysissystems known in the art. Reading frames within the cDNA sequences wereidentified using standard methods (reviewed in Ausubel, 1997, supra,unit 7.7). Some of the cDNA sequences were selected for extension usingthe techniques disclosed in Example VIII.

[0286] The polynucleotide sequences derived from Incyte cDNAs werevalidated by removing vector, linker, and poly(A) sequences and bymasking ambiguous bases, using algorithms and programs based on BLAST,dynamic programming, and dinucleotide nearest neighbor analysis. TheIncyte cDNA sequences or translations thereof were then queried againsta selection of public databases such as the GenBank primate, rodent,mammalian, vertebrate, and eukaryote databases, and BLOCKS, PRINTS,DOMO, PRODOM, and hidden Markov model (HMM)-based protein familydatabases such as PFAM. (HMM is a probabilistic approach which analyzesconsensus primary structures of gene families. See, for example, Eddy,S. R. (1996) Curr. Opin. Struct. Biol. 6:361-365.) The queries wereperformed using programs based on BLAST, FASTA, BLIMPS, and HMMER. TheIncyte cDNA sequences were assembled to produce full lengthpolynucleotide sequences. Alternatively, GenBank cDNAs, GenBank ESTs,stitched sequences, stretched sequences, or Genscan-predicted codingsequences (see Examples IV and V) were used to extend Incyte cDNAassemblages to full length. Assembly was performed using programs basedon Phred, Phrap, and Consed, and cDNA assemblages were screened for openreading frames using programs based on GeneMark, BLAST, and FASTA. Thefull length polynucleotide sequences were translated to derive thecorresponding full length polypeptide sequences. Alternatively, apolypeptide of the invention may begin at any of the methionine residuesof the full length translated polypeptide. Full length polypeptidesequences were subsequently analyzed by querying against databases suchas the GenBank protein databases (genpept), SwissProt, BLOCKS, PRINTS,DOMO, PRODOM, Prosite, and hidden Markov model (HMM)-based proteinfamily databases such as PFAM. Full length polynucleotide sequences arealso analyzed using MAcDNASIS PRO software (Hitachi SoftwareEngineering, South San Francisco Calif.) and LASERGENE software(DNASTAR). Polynucleotide and polypeptide sequence alignments aregenerated using default parameters specified by the CLUSTAL algorithm asincorporated into the MEGALIGN multisequence alignment program(DNASTAR), which also calculates the percent identity between alignedsequences.

[0287] Table 7 summarizes the tools, programs, and algorithms used forthe analysis and assembly of Incyte cDNA and full length sequences andprovides applicable descriptions, references, and threshold parameters.The first column of Table 7 shows the tools, programs, and algorithmsused, the second column provides brief descriptions thereof, the thirdcolumn presents appropriate references, all of which are incorporated byreference herein in their entirety, and the fourth column presents,where applicable, the scores, probability values, and other parametersused to evaluate the strength of a match between two sequences (thehigher the score or the lower the probability value, the greater theidentity between two sequences).

[0288] The programs described above for the assembly and analysis offull length polynucleotide and polypeptide sequences were also used toidentify polynucleotide sequence fragments from SEQ ID NO:22-42.Fragments from about 20 to about 4000 nucleotides which are useful inhybridization and amplification technologies are described in Table 4,column 4.

[0289] IV. Identification and Editing of Coding Sequences from GenomicDNA

[0290] Putative proteases were initially identified by running theGenscan gene identification program against public genomic sequencedatabases (e.g., gbpri and gbhtg). Genscan is a general-purpose geneidentification program which analyzes genomic DNA sequences from avariety of organisms (See Burge, C. and S. Karlin (1997) J. Mol. Biol.268:78-94, and Burge, C. and S. Karlin (1998) Curr. Opin. Struct. Biol.8:346-354). The program concatenates predicted exons to form anassembled cDNA sequence extending from a methionine to a stop codon. Theoutput of Genscan is a FASTA database of polynucleotide and polypeptidesequences. The maximum range of sequence for Genscan to analyze at oncewas set to 30 kb. To determine which of these Genscan predicted cDNAsequences encode proteases, the encoded polypeptides were analyzed byquerying against PFAM models for proteases. Potential proteases werealso identified by homology to Incyte cDNA sequences that had beenannotated as proteases. These selected Genscan-predicted sequences werethen compared by BLAST analysis to the genpept and gbpri publicdatabases. Where necessary, the Genscan-predicted sequences were thenedited by comparison to the top BLAST hit from genpept to correct errorsin the sequence predicted by Genscan, such as extra or omitted exons.BLAST analysis was also used to find any Incyte cDNA or public cDNAcoverage of the Genscan-predicted sequences, thus providing evidence fortranscription. When Incyte cDNA coverage was available, this informationwas used to correct or confirm the Genscan predicted sequence. Fulllength polynucleotide sequences were obtained by assemblingGenscan-predicted coding sequences with Incyte cDNA sequences and/orpublic cDNA sequences using the assembly process described in ExampleIII. Alternatively, full length polynucleotide sequences were derivedentirely from edited or unedited Genscan-predicted coding sequences.

[0291] V. Assembly of Genomic Sequence Data with cDNA Sequence Data

[0292] “Stitched” Sequences

[0293] Partial cDNA sequences were extended with exons predicted by theGenscan gene identification program described in Example IV. PartialcDNAs assembled as described in Example III were mapped to genomic DNAand parsed into clusters containing related cDNAs and Genscan exonpredictions from one or more genomic sequences. Each cluster wasanalyzed using an algorithm based on graph theory and dynamicprogramming to integrate cDNA and genomic information, generatingpossible splice variants that were subsequently confirmed, edited, orextended to create a full length sequence. Sequence intervals in whichthe entire length of the interval was present on more than one sequencein the cluster were identified, and intervals thus identified wereconsidered to be equivalent by transitivity. For example, if an intervalwas present on a cDNA and two genomic sequences, then all threeintervals were considered to be equivalent. This process allowsunrelated but consecutive genomic sequences to be brought together,bridged by cDNA sequence. Intervals thus identified were then “stitched”together by the stitching algorithm in the order that they appear alongtheir parent sequences to generate the longest possible sequence, aswell as sequence variants. Linkages between intervals which proceedalong one type of parent sequence (cDNA to cDNA or genomic sequence togenomic sequence) were given preference over linkages which changeparent type (cDNA to genomic sequence). The resultant stitched sequenceswere translated and compared by BLAST analysis to the genpept and gbpripublic databases. Incorrect exons predicted by Genscan were corrected bycomparison to the top BLAST hit from genpept. Sequences were furtherextended with additional cDNA sequences, or by inspection of genomicDNA, when necessary.

[0294] “Stretched” Sequences

[0295] Partial DNA sequences were extended to full length with analgorithm based on BLAST analysis. First, partial cDNAs assembled asdescribed in Example III were queried against public databases such asthe GenBank primate, rodent, mammalian, vertebrate, and eukaryotedatabases using the BLAST program. The nearest GenBank protein homologwas then compared by BLAST analysis to either Incyte cDNA sequences orGenScan exon predicted sequences described in Example IV. A chimericprotein was generated by using the resultant high-scoring segment pairs(HSPs) to map the translated sequences onto the GenBank protein homolog.Insertions or deletions may occur in the chimeric protein with respectto the original GenBank protein homolog. The GenBank protein homolog,the chimeric protein, or both were used as probes to search forhomologous genomic sequences from the public human genome databases.Partial DNA sequences were therefore “stretched” or extended by theaddition of homologous genomic sequences. The resultant stretchedsequences were examined to determine whether it contained a completegene.

[0296] VI. Chromosomal Mapping of PRTS Encoding Polynucleotides

[0297] The sequences which were used to assemble SEQ ID NO:22-42 werecompared with sequences from the Incyte LIFESEQ database and publicdomain databases using BLAST and other implementations of theSmith-Waterman algorithm. Sequences from these databases that matchedSEQ ID NO:22-42 were assembled into clusters of contiguous andoverlapping sequences using assembly algorithms such as Phrap (Table 7).Radiation hybrid and genetic mapping data available from publicresources such as the Stanford Human Genome Center (SHGC), WhiteheadInstitute for Genome Research (WIGR), and Généthon were used todetermine if any of the clustered sequences had been previously mapped.Inclusion of a mapped sequence in a cluster resulted in the assignmentof all sequences of that cluster, including its particular SEQ ID NO:,to that map location.

[0298] Map locations are represented by ranges, or intervals, of humanchromosomes. The map position of an interval, in centiMorgans, ismeasured relative to the terminus of the chromosome's p-arm. (ThecentiMorgan (cM) is a unit of measurement based on recombinationfrequencies between chromosomal markers. On average, 1 cM is roughlyequivalent to 1 megabase (Mb) of DNA in humans, although this can varywidely due to hot and cold spots of recombination.) The cM distances arebased on genetic markers mapped by Genethon which provide boundaries forradiation hybrid markers whose sequences were included in each of theclusters. Human genome maps and other resources available to the public,such as the NCBI “GeneMap'99” World Wide Web site(http://www.ncbi.nlm.nih.gov/genemap/), can be employed to determine ifpreviously identified disease genes map within or in proximity to theintervals indicated above.

[0299] In this manner, SEQ ID NO:37 was mapped to chromosome 17 withinthe interval from 69.3 to 74.5 centiMorgans, and to chromosome 23 withinthe interval from 68.2 to 90.8 centiMorgans. Similarly, SEQ ID NO:32 wasmapped to chromosome 16 within the interval from 81.8 to 84.4centiMorgans. Additionally, SEQ ID NO:31 was mapped to chromosome 3within the interval from 88.2 to 90.1 centiMorgans, and within theinterval from 91.0 to 97.2 centiMorgans. More than one map location isreported for SEQ ID NO:37 and SEQ ID NO:31, indicating that sequenceshaving different map locations were assembled into a single cluster.This situation occurs, for example, when sequences having strongsimilarity, but not complete identity, are assembled into a singlecluster.

[0300] VII. Analysis of Polynucleotide Expression

[0301] Northern analysis is a laboratory technique used to detect thepresence of a transcript of a gene and involves the hybridization of alabeled nucleotide sequence to a membrane on which RNAs from aparticular cell type or tissue have been bound. (See, e.g., Sambrook,supra ch. 7; Ausubel (1995) supra, ch. 4 and 16.)

[0302] Analogous computer techniques applying BLAST were used to searchfor identical or related molecules in cDNA databases such as GenBank orLIFESEQ (Incyte Genomics). This analysis is much faster than multiplemembrane-based hybridizations. In addition, the sensitivity of thecomputer search can be modified to determine whether any particularmatch is categorized as exact or similar. The basis of the search is theproduct score, which is defined as:$\frac{{BLAST}\quad {Score} \times {Percent}\quad {Identity}}{5 \times {minimum}\quad \{ {{{length}( {{Seq}.\quad 1} )},{{length}( {{Seq}.\quad 2} )}} \}}$

[0303] The product score takes into account both the degree ofsimilarity between two sequences and the length of the sequence match.The product score is a normalized value between 0 and 100, and iscalculated as follows: the BLAST score is multiplied by the percentnucleotide identity and the product is divided by (5 times the length ofthe shorter of the two sequences). The BLAST score is calculated byassigning a score of +5 for every base that matches in a high-scoringsegment pair (HSP), and −4 for every mismatch. Two sequences may sharemore than one HSP (separated by gaps). If there is more than one HSP,then the pair with the highest BLAST score is used to calculate theproduct score. The product score represents a balance between fractionaloverlap and quality in a BLAST alignment. For example, a product scoreof 100 is produced only for 100% identity over the entire length of theshorter of the two sequences being compared. A product score of 70 isproduced either by 100% identity and 70% overlap at one end, or by 88%identity and 100% overlap at the other. A product score of 50 isproduced either by 100% identity and 50% overlap at one end, or 79%identity and 100% overlap.

[0304] Alternatively, polynucleotide sequences encoding PRTS areanalyzed with respect to the tissue sources from which they werederived. For example, some full length sequences are assembled, at leastin part, with overlapping Incyte cDNA sequences (see Example III). EachcDNA sequence is derived from a cDNA library constructed from a humantissue. Each human tissue is classified into one of the followingorgan/tissue categories: cardiovascular system; connective tissue;digestive system; embryonic structures; endocrine system; exocrineglands; genitalia, female; genitalia, male; germ cells; hemic and immunesystem; liver; musculoskeletal system; nervous system; pancreas;respiratory system; sense organs; skin; stomatognathic system;unclassified/mixed; or urinary tract. The number of libraries in eachcategory is counted and divided by the total number of libraries acrossall categories. Similarly, each human tissue is classified into one ofthe following disease/condition categories: cancer, cell line,developmental, inflammation, neurological, trauma, cardiovascular,pooled, and other, and the number of libraries in each category iscounted and divided by the total number of libraries across allcategories. The resulting percentages reflect the tissue- anddisease-specific expression of cDNA encoding PRTS. cDNA sequences andcDNA library/tissue information are found in the LIFESEQ GOLD database(Incyte Genomics, Palo Alto Calif.).

[0305] VIII. Extension of PRTS Encoding Polynucleotides

[0306] Full length polynucleotide sequences were also produced byextension of an appropriate fragment of the full length molecule usingoligonucleotide primers designed from this fragment. One primer wassynthesized to initiate 5′ extension of the known fragment, and theother primer was synthesized to initiate 3′ extension of the knownfragment. The initial primers were designed using OLIGO 4.06 software(National Biosciences), or another appropriate program, to be about 22to 30 nucleotides in length, to have a GC content of about 50% or more,and to anneal to the target sequence at temperatures of about 68° C. toabout 72° C. Any stretch of nucleotides which would result in hairpinstructures and primer-primer dimerizations was avoided.

[0307] Selected human cDNA libraries were used to extend the sequence.If more than one extension was necessary or desired, additional ornested sets of primers were designed.

[0308] High fidelity amplification was obtained by PCR using methodswell known in the art. PCR was performed in 96-well plates using thePTC-200 thermal cycler (MJ Research, Inc.). The reaction mix containedDNA template, 200 mmol of each primer, reaction buffer containing Mg²⁺,(NH₄)₂SO₄, and 2-mercaptoethanol, Taq DNA polymerase (Amersham PharmaciaBiotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase(Stratagene), with the following parameters for primer pair PCI A andPCl B: Step 1: 94° C., 3 min; Step 2: 94° C., 15 see; Step 3: 60° C., 1min; Step 4: 68° C., 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times;Step 6: 68° C., 5 min; Step 7: storage at 4° C. In the alternative, theparameters for primer pair T7 and SK+ were as follows: Step 1: 94° C., 3min; Step 2: 94° C., 15 sec; Step 3: 57° C., 1 min; Step 4: 68° C., 2min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68° C., 5 min;Step 7: storage at 4° C.

[0309] The concentration of DNA in each well was determined bydispensing 100 μl PICOGREEN quantitation reagent (0.25% (v/v) PICOGREEN;Molecular Probes, Eugene Oreg.) dissolved in 1×TE and 0.5 μl ofundiluted PCR product into each well of an opaque fluorimeter plate(Coming Costar, Acton Mass.), allowing the DNA to bind to the reagent.The plate was scanned in a Fluoroskan II (Labsystems Oy, Helsinki,Finland) to measure the fluorescence of the sample and to quantify theconcentration of DNA. A 5 μl to 10 μl aliquot of the reaction mixturewas analyzed by electrophoresis on a 1% agarose gel to determine whichreactions were successful in extending the sequence.

[0310] The extended nucleotides were desalted and concentrated,transferred to 384-well plates, digested with CviJI cholera virusendonuclease (Molecular Biology Research, Madison Wis.), and sonicatedor sheared prior to religation into pUC 18 vector (Amersham PharmaciaBiotech). For shotgun sequencing, the digested nucleotides wereseparated on low concentration (0.6 to 0.8%) agarose gels, fragmentswere excised, and agar digested with Agar ACE (Promega). Extended cloneswere religated using T4 ligase (New England Biolabs, Beverly Mass.) intopUC 18 vector (Amersham Pharmacia Biotech), treated with Pfu DNApolymerase (Stratagene) to fill-in restriction site overhangs, andtransfected into competent E. coli cells. Transformed cells wereselected on antibiotic-containing media, and individual colonies werepicked and cultured overnight at 37° C. in 384-well plates in LB/2× carbliquid media.

[0311] The cells were lysed, and DNA was amplified by PCR using Taq DNApolymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase(Stratagene) with the following parameters: Step 1: 94° C., 3 min; Step2: 94° C., 15 sec; Step 3: 60° C., 1 min; Step 4: 72° C., 2 min; Step 5:steps 2, 3, and 4 repeated 29 times; Step 6: 72° C., 5 min; Step 7:storage at 4° C. DNA was quantified by PICOGREEN reagent (MolecularProbes) as described above. Samples with low DNA recoveries werereamplified using the same conditions as described above. Samples werediluted with 20% dimethysulfoxide (1:2, v/v), and sequenced usingDYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit(Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cyclesequencing ready reaction kit (Applied Biosystems).

[0312] In like manner, full length polynucleotide sequences are verifiedusing the above procedure or are used to obtain 5′ regulatory sequencesusing the above procedure along with oligonucleotides designed for suchextension, and an appropriate genomic library.

[0313] IX. Labeling and Use of Individual Hybridization Probes

[0314] Hybridization probes derived from SEQ ID NO:22-42 are employed toscreen cDNAs, genomic DNAs, or mRNAs. Although the labeling ofoligonucleotides, consisting of about 20 base pairs, is specificallydescribed, essentially the same procedure is used with larger nucleotidefragments. Oligonucleotides are designed using state-of-the-art softwaresuch as OLIGO 4.06 software (National Biosciences) and labeled bycombining 50 pmol of each oligomer, 250 μCi of [γ-³²P] adenosinetriphosphate (Amersham Pharmacia Biotech), and T4 polynucleotide kinase(DuPont NEN, Boston Mass.). The labeled oligonucleotides aresubstantially purified using a SEPHADEX G-25 superfine size exclusiondextran bead column (Amersham Pharmacia Biotech). An aliquot containing10 counts per minute of the labeled probe is used in a typicalmembrane-based hybridization analysis of human genomic DNA digested withone of the following endonucleases: Ase I, Bgl II, Eco RI, Pst I, Xba I,or Pvu II (DuPont NEN).

[0315] The DNA from each digest is fractionated on a 0.7% agarose geland transferred to nylon membranes (Nytran Plus, Schleicher & Schuell,Durham NH). Hybridization is carried out for 16 hours at 40° C. Toremove nonspecific signals, blots are sequentially washed at roomtemperature under conditions of up to, for example, 0.1×saline sodiumcitrate and 0.5% sodium dodecyl sulfate. Hybridization patterns arevisualized using autoradiography or an alternative imaging means andcompared.

[0316] X. Microarrays

[0317] The linkage or synthesis of array elements upon a microarray canbe achieved utilizing photolithography, piezoelectric printing (ink-jetprinting, See, e.g., Baldeschweiler, supra.), mechanical microspottingtechnologies, and derivatives thereof. The substrate in each of theaforementioned technologies should be uniform and solid with anon-porous surface (Schena (1999), supra). Suggested substrates includesilicon, silica, glass slides, glass chips, and silicon wafers.Alternatively, a procedure analogous to a dot or slot blot may also beused to arrange and link elements to the surface of a substrate usingthermal, UV, chemical, or mechanical bonding procedures. A typical arraymay be produced using available methods and machines well known to thoseof ordinary skill in the art and may contain any appropriate number ofelements. (See, e.g., Schena, M. et al. (1995) Science 270:467-470;Shalon, D. et al. (1996) Genome Res. 6:639-645; Marshall, A. and J.Hodgson (1998) Nat. Biotechnol. 16:27-31.)

[0318] Full length cDNAs, Expressed Sequence Tags (ESTs), or fragmentsor oligomers thereof may comprise the elements of the microarray.Fragments or oligomers suitable for hybridization can be selected usingsoftware well known in the art such as LASERGENE software (DNASTAR). Thearray elements are hybridized with polynucleotides in a biologicalsample. The polynucleotides in the biological sample are conjugated to afluorescent label or other molecular tag for ease of detection. Afterhybridization, nonhybridized nucleotides from the biological sample areremoved, and a fluorescence scanner is used to detect hybridization ateach array element. Alternatively, laser desorbtion and massspectrometry may be used for detection of hybridization. The degree ofcomplementarity and the relative abundance of each polynucleotide whichhybridizes to an element on the microarray may be assessed. In oneembodiment, microarray preparation and usage is described in detailbelow.

[0319] Tissue or Cell Sample Preparation

[0320] Total RNA is isolated from tissue samples using the guanidiniumthiocyanate method and poly(A)⁺ RNA is purified using the oligo-(dT)cellulose method. Each poly(A)⁺ RNA sample is reverse transcribed usingMMLV reverse-transcriptase, 0.05 pg/μl oligo-(dT) primer (21mer), 1×first strand buffer, 0.03 units/μl RNase inhibitor, 500 μM dATP, 500 μMdGTP, 500 μM dTTP, 40 μM dCTP, 40 μM dCTP-Cy3 (BDS) or dCTP-Cy5(Amersham Pharmacia Biotech). The reverse transcription reaction isperformed in a 25 ml volume containing 200 ng poly(A)⁺ RNA withGEMBRIGHT kits (Incyte). Specific control poly(A)⁺ RNAs are synthesizedby in vitro transcription from non-coding yeast genomic DNA. Afterincubation at 37° C. for 2 hr, each reaction sample (one with Cy3 andanother with Cy5 labeling) is treated with 2.5 ml of 0.5M sodiumhydroxide and incubated for 20 minutes at 85° C. to the stop thereaction and degrade the RNA. Samples are purified using two successiveCHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratories, Inc.(CLONTECH), Palo Alto Calif.) and after combining, both reaction samplesare ethanol precipitated using 1 ml of glycogen (1 mg/ml), 60 ml sodiumacetate, and 300 ml of 100% ethanol. The sample is then dried tocompletion using a SpeedVAC (Savant Instruments Inc., Holbrook N.Y.) andresuspended in 14 μl 5×SSC/0.2% SDS.

[0321] Microarray Preparation

[0322] Sequences of the present invention are used to generate arrayelements. Each array element is amplified from bacterial cellscontaining vectors with cloned cDNA inserts. PCR amplification usesprimers complementary to the vector sequences flanking the cDNA insert.Array elements are amplified in thirty cycles of PCR from an initialquantity of 1-2 ng to a final quantity greater than 5 μg. Amplifiedarray elements are then purified using SEPHACRYL-400 (Amersham PharmaciaBiotech).

[0323] Purified array elements are immobilized on polymer-coated glassslides. Glass microscope slides (Corning) are cleaned by ultrasound in0.1% SDS and acetone, with extensive distilled water washes between andafter treatments. Glass slides are etched in 4% hydrofluoric acid (VWRScientific Products Corporation (VWR), West Chester Pa.), washedextensively in distilled water, and coated with 0.05% aminopropyl silane(Sigma) in 95% ethanol. Coated slides are cured in a 110° C. oven.

[0324] Array elements are applied to the coated glass substrate using aprocedure described in U.S. Pat. No. 5,807,522, incorporated herein byreference. 1 μl of the array element DNA, at an average concentration of100 ng/μl, is loaded into the open capillary printing element by ahigh-speed robotic apparatus. The apparatus then deposits about 5 nl ofarray element sample per slide.

[0325] Microarrays are UV-crosslinked using a STRATALINKERUV-crosslinker (Stratagene). Microarrays are washed at room temperatureonce in 0.2% SDS and three times in distilled water. Non-specificbinding sites are blocked by incubation of microarrays in 0.2% casein inphosphate buffered saline (PBS) (Tropix, Inc., Bedford Mass.) for 30minutes at 600 C followed by washes in 0.2% SDS and distilled water asbefore.

[0326] Hybridization

[0327] Hybridization reactions contain 9 μl of sample mixture consistingof 0.2 μg each of Cy3 and Cy5 labeled cDNA synthesis products in 5×SSC,0.2% SDS hybridization buffer. The sample mixture is heated to 65° C.for 5 minutes and is aliquoted onto the microarray surface and coveredwith an 1.8 cm² coverslip. The arrays are transferred to a waterproofchamber having a cavity just slightly larger than a microscope slide.The chamber is kept at 100% humidity internally by the addition of 140μl of 5×SSC in a corner of the chamber. The chamber containing thearrays is incubated for about 6.5 hours at 60° C. The arrays are washedfor 10 min at 45° C. in a first wash buffer (1×SSC, 0.1% SDS), threetimes for 10 minutes each at 45° C. in a second wash buffer (0.1×SSC),and dried.

[0328] Detection

[0329] Reporter-labeled hybridization complexes are detected with amicroscope equipped with an Innova 70 mixed gas 10 W laser (Coherent,Inc., Santa Clara Calif.) capable of generating spectral lines at 488 nmfor excitation of Cy3 and at 632 nm for excitation of Cy5. Theexcitation laser light is focused on the array using a 20×microscopeobjective (Nikon, Inc., Melville NY). The slide containing the array isplaced on a computer-controlled X-Y stage on the microscope andraster-scanned past the objective. The 1.8 cm×1.8 cm array used in thepresent example is scanned with a resolution of 20 micrometers.

[0330] In two separate scans, a mixed gas multiline laser excites thetwo fluorophores sequentially. Emitted light is split, based onwavelength, into two photomultiplier tube detectors (PMT R1477,Hamamatsu Photonics Systems, Bridgewater N.J.) corresponding to the twofluorophores. Appropriate filters positioned between the array and thephotomultiplier tubes are used to filter the signals. The emissionmaxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5.Each array is typically scanned twice, one scan per fluorophore usingthe appropriate filters at the laser source, although the apparatus iscapable of recording the spectra from both fluorophores simultaneously.

[0331] The sensitivity of the scans is typically calibrated using thesignal intensity generated by a cDNA control species added to the samplemixture at a known concentration. A specific location on the arraycontains a complementary DNA sequence, allowing the intensity of thesignal at that location to be correlated with a weight ratio ofhybridizing species of 1:100,000. When two samples from differentsources (e.g., representing test and control cells), each labeled with adifferent fluorophore, are hybridized to a single array for the purposeof identifying genes that are differentially expressed, the calibrationis done by labeling samples of the calibrating cDNA with the twofluorophores and adding identical amounts of each to the hybridizationmixture.

[0332] The output of the photomultiplier tube is digitized using a12-bit RTI-835H analog-to-digital (A/D) conversion board (AnalogDevices, Inc., Norwood MA) installed in an IBM-compatible PC computer.The digitized data are displayed as an image where the signal intensityis mapped using a linear 20-color transformation to a pseudocolor scaleranging from blue (low signal) to red (high signal). The data is alsoanalyzed quantitatively. Where two different fluorophores are excitedand measured simultaneously, the data are first corrected for opticalcrosstalk (due to overlapping emission spectra) between the fluorophoresusing each fluorophore's emission spectrum.

[0333] A grid is superimposed over the fluorescence signal image suchthat the signal from each spot is centered in each element of the grid.The fluorescence signal within each element is then integrated to obtaina numerical value corresponding to the average intensity of the signal.The software used for signal analysis is the GEMTOOLS gene expressionanalysis program (Incyte).

[0334] XI. Complementary Polynucleotides

[0335] Sequences complementary to the PRTS-encoding sequences, or anyparts thereof, are used to detect, decrease, or inhibit expression ofnaturally occurring PRTS. Although use of oligonucleotides comprisingfrom about 15 to 30 base pairs is described, essentially the sameprocedure is used with smaller or with larger sequence fragments.Appropriate oligonucleotides are designed using OLIGO 4.06 software(National Biosciences) and the coding sequence of PRTS. To inhibittranscription, a complementary oligonucleotide is designed from the mostunique 5′ sequence and used to prevent promoter binding to the codingsequence. To inhibit translation, a complementary oligonucleotide isdesigned to prevent ribosomal binding to the PRTS-encoding transcript.

[0336] XII. Expression of PRTS

[0337] Expression and purification of PRTS is achieved using bacterialor virus-based expression systems. For expression of PRTS in bacteria,cDNA is subcloned into an appropriate vector containing an antibioticresistance gene and an inducible promoter that directs high levels ofcDNA transcription. Examples of such promoters include, but are notlimited to, the trp-lac (tac) hybrid promoter and the T5 or T7bacteriophage promoter in conjunction with the lac operator regulatoryelement. Recombinant vectors are transformed into suitable bacterialhosts, e.g., BL21 (DE3). Antibiotic resistant bacteria express PRTS uponinduction with isopropyl beta-D-thiogalactopyranoside (IPTG). Expressionof PRTS in eukaryotic cells is achieved by infecting insect or mammaliancell lines with recombinant Autographica californica nuclearpolyhedrosis virus (AcMNPV), commonly known as baculovirus. Thenonessential polyhedrin gene of baculovirus is replaced with cDNAencoding PRTS by either homologous recombination or bacterial-mediatedtransposition involving transfer plasmid intermediates. Viralinfectivity is maintained and the strong polyhedrin promoter drives highlevels of cDNA transcription. Recombinant baculovirus is used to infectSpodoptera frugiperda (Sf9) insect cells in most cases, or humanhepatocytes, in some cases. Infection of the latter requires additionalgenetic modifications to baculovirus. (See Engelhard, E. K. et al.(1994) Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996)Hum. Gene Ther. 7:1937-1945.)

[0338] In most expression systems, PRTS is synthesized as a fusionprotein with, e.g., glutathione S-transferase (GST) or a peptide epitopetag, such as FLAG or 6-His, permitting rapid, single-step,affinity-based purification of recombinant fusion protein from crudecell lysates. GST, a 26-kilodalton enzyme from Schistosoma japonicum,enables the purification of fusion proteins on immobilized glutathioneunder conditions that maintain protein activity and antigenicity(Amersham Pharmacia Biotech). Following purification, the GST moiety canbe proteolytically cleaved from PRTS at specifically engineered sites.FLAG, an 8-amino acid peptide, enables immunoaffinity purification usingcommercially available monoclonal and polyclonal anti-FLAG antibodies(Eastman Kodak). 6-His, a stretch of six consecutive histidine residues,enables purification on metal-chelate resins (QIAGEN). Methods forprotein expression and purification are discussed in Ausubel (1995,supra, ch. 10 and 16). Purified PRTS obtained by these methods can beused directly in the assays shown in Examples XVI, XVII, XVIII, and XIXwhere applicable.

[0339] XIII. Functional Assays

[0340] PRTS function is assessed by expressing the sequences encodingPRTS at physiologically elevated levels in mammalian cell culturesystems. cDNA is subcloned into a mammalian expression vector containinga strong promoter that drives high levels of cDNA expression. Vectors ofchoice include PCMV SPORT (Life Technologies) and PCR3.1 (Invitrogen,Carlsbad Calif.), both of which contain the cytomegalovirus promoter.5-10 μg of recombinant vector are transiently transfected into a humancell line, for example, an endothelial or hematopoietic cell line, usingeither liposome formulations or electroporation. 1-2 μg of an additionalplasmid containing sequences encoding a marker protein areco-transfected. Expression of a marker protein provides a means todistinguish transfected cells from nontransfected cells and is areliable predictor of cDNA expression from the recombinant vector.Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP;Clontech), CD64, or a CD64-GFP fusion protein. Flow cytometry (FCM), anautomated, laser optics-based technique, is used to identify transfectedcells expressing GFP or CD64-GFP and to evaluate the apoptotic state ofthe cells and other cellular properties. FCM detects and quantifies theuptake of fluorescent molecules that diagnose events preceding orcoincident with cell death. These events include changes in nuclear DNAcontent as measured by staining of DNA with propidium iodide; changes incell size and granularity as measured by forward light scatter and 90degree side light scatter; down-regulation of DNA synthesis as measuredby decrease in bromodeoxyuridine uptake; alterations in expression ofcell surface and intracellular proteins as measured by reactivity withspecific antibodies; and alterations in plasma membrane composition asmeasured by the binding of fluorescein-conjugated Annexin V protein tothe cell surface. Methods in flow cytometry are discussed in Ormerod, M.G. (1994) Flow Cytometry, Oxford, New York N.Y.

[0341] The influence of PRTS on gene expression can be assessed usinghighly purified populations of cells transfected with sequences encodingPRTS and either CD64 or CD64-GFP. CD64 and CD64-GFP are expressed on thesurface of transfected cells and bind to conserved regions of humanimmunoglobulin G (IgG). Transfected cells are efficiently separated fromnontransfected cells using magnetic beads coated with either human IgGor antibody against CD64 (DYNAL, Lake Success N.Y.). mRNA can bepurified from the cells using methods well known by those of skill inthe art. Expression of mRNA encoding PRTS and other genes of interestcan be analyzed by northern analysis or microarray techniques.

[0342] XIV. Production of PRTS Specific Antibodies

[0343] PRTS substantially purified using polyacrylamide gelelectrophoresis (PAGE; see, e.g., Harrington, M. G. (1990) MethodsEnzymol. 182:488-495), or other purification techniques, is used toimmunize rabbits and to produce antibodies using standard protocols.

[0344] Alternatively, the PRTS amino acid sequence is analyzed usingLASERGENE software (DNASTAR) to determine regions of highimmunogenicity, and a corresponding oligopeptide is synthesized and usedto raise antibodies by means known to those of skill in the art. Methodsfor selection of appropriate epitopes, such as those near the C-terminusor in hydrophilic regions are well described in the art. (See, e.g.,Ausubel, 1995, supra, ch. 11.)

[0345] Typically, oligopeptides of about 15 residues in length aresynthesized using an ABI 431A peptide synthesizer (Applied Biosystems)using FMOC chemistry and coupled to KLH (Sigma-Aldrich, St. Louis Mo.)by reaction with N-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) toincrease immunogenicity. (See, e.g., Ausubel, 1995, supra.) Rabbits areimmunized with the oligopeptide-KLH complex in complete Freund'sadjuvant. Resulting antisera are tested for anlipeptide and anti-PRTSactivity by, for example, binding the peptide or PRTS to a substrate,blocking with 1% BSA, reacting with rabbit antisera, washing, andreacting with radio-iodinated goat anti-rabbit IgG.

[0346] XV. Purification of Naturally Occurring PRTS Using SpecificAntibodies

[0347] Naturally occurring or recombinant PRTS is substantially purifiedby immunoaffinity chromatography using antibodies specific for PRTS. Animmunoaffinity column is constructed by covalently coupling anti-PRTSantibody to an activated chromatographic resin, such as CNBr-activatedSEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin isblocked and washed according to the manufacturer's instructions.

[0348] Media containing PRTS are passed over the immunoaffinity column,and the column is washed under conditions that allow the preferentialabsorbance of PRTS (e.g., high ionic strength buffers in the presence ofdetergent). The column is eluted under conditions that disruptantibody/PRTS binding (e.g., a buffer of pH 2 to pH 3, or a highconcentration of a chaotrope, such as urea or thiocyanate ion), and PRTSis collected.

[0349] XVI. Identification of Molecules Which Interact with PRTS

[0350] PRTS, or biologically active fragments thereof, are labeled with¹²⁵I Bolton-Hunter reagent. (See, e.g., Bolton A. E. and W. M. Hunter(1973) Biochem. J. 133:529-539.) Candidate molecules previously arrayedin the wells of a multi-well plate are incubated with the labeled PRTS,washed, and any wells with labeled PRTS complex are assayed. Dataobtained using different concentrations of PRTS are used to calculatevalues for the number, affinity, and association of PRTS with thecandidate molecules.

[0351] Alternatively, molecules interacting with PRTS are analyzed usingthe yeast two-hybrid system as described in Fields, S. and O. Song(1989) Nature 340:245-246, or using commercially available kits based onthe two-hybrid system, such as the MATCHMAKER system (Clontech).

[0352] PRTS may also be used in the PATHCALLING process (CuraGen Corp.,New Haven Conn.) which employs the yeast two-hybrid system in ahigh-throughput manner to determine all interactions between theproteins encoded by two large libraries of genes (Nandabalan, K. et al.(2000) U.S. Pat. No. 6,057,101).

[0353] XVII. Demonstration of PRTS Activity

[0354] Protease activity is measured by the hydrolysis of appropriatesynthetic peptide substrates conjugated with various chromogenicmolecules in which the degree of hydrolysis is quantified byspectrophotometric (or fluorometric) absorption of the releasedchromophore (Beynon, R. J. and J. S. Bond (1994) Proteolytic Enzymes: APractical Approach, Oxford University Press, New York, N.Y., pp.25-55).Peptide substrates are designed according to the category of proteaseactivity as endopeptidase (serine, cysteine, aspartic proteases, ormetalloproteases), aminopeptidase (leucine aminopeptidase), orcarboxypeptidase (carboxypeptidases A and B, procollagen C-proteinase).Commonly used chromogens are 2-naphthylamine, 4-nitroaniline, andfurylacrylic acid. Assays are performed at ambient temperature andcontain an aliquot of the enzyme and the appropriate substrate in asuitable buffer. Reactions are carried out in an optical cuvette, andthe increase/decrease in absorbance of the chromogen released duringhydrolysis of the peptide substrate is measured. The change inabsorbance is proportional to the enzyme activity in the assay.

[0355] In the alternative, an assay for protease activity takesadvantage of fluorescence resonance energy transfer (FRET) that occurswhen one donor and one acceptor fluorophore with an appropriate spectraloverlap are in close proximity. A flexible peptide linker containing acleavage site specific for PRTS is fused between a red-shifted variant(RSGFP4) and a blue variant (BFP5) of Green Fluorescent Protein. Thisfusion protein has spectral properties that suggest energy transfer isoccurring from BFP5 to RSGFP4. When the fusion protein is incubated withPRTS, the substrate is cleaved, and the two fluorescent proteinsdissociate. This is accompanied by a marked decrease in energy transferwhich is quantified by comparing the emission spectra before and afterthe addition of PRTS (Mitra, R. D. et al (1996) Gene 173:13-17). Thisassay can also be performed in living cells. In this case thefluorescent substrate protein is expressed constitutively in cells andPRTS is introduced on an inducible vector so that FRET can be monitoredin the presence and absence of PRTS (Sagot, I. et al (1999) FEBS Letters447:53-57).

[0356] XVIII. Identification of PRTS Substrates

[0357] Phage display libraries can be used to identify optimal substratesequences for PRTS. A random hexamer followed by a linker and a knownantibody epitope is cloned as an N-terminal extension of gene III in afilamentous phage library. Gene III codes for a coat protein, and theepitope will be displayed on the surface of each phage particle. Thelibrary is incubated with PRTS under proteolytic conditions so that theepitope will be removed if the hexamer codes for a PRTS cleavage site.An antibody that recognizes the epitope is added along with immobilizedprotein A. Uncleaved phage, which still bear the epitope, are removed bycentrifugation. Phage in the supernatant are then amplified and undergoseveral more rounds of screening. Individual phage clones are thenisolated and sequenced. Reaction kinetics for these peptide substratescan be studied using an assay in Example XVII, and an optimal cleavagesequence can be derived (Ke, S. H. et al. (1997) J. Biol. Chem.272:16603-16609).

[0358] To screen for in vivo PRTS substrates, this method can beexpanded to screen a cDNA expression library displayed on the surface ofphage particles (T7SELECT™ 10-3 Phage display vector, Novagen, Madison,Wis.) or yeast cells (PYD1 yeast display vector kit, Invitrogen,Carlsbad, Calif.). In this case, entire cDNAs are fused between Gene IIIand the appropriate epitope.

[0359] XIX. Identification of PRTS Inhibitors

[0360] Compounds to be tested are arrayed in the wells of a multi-wellplate in varying concentrations along with an appropriate buffer andsubstrate, as described in the assays in Example XVII. PRTS activity ismeasured for each well and the ability of each compound to inhibit PRTSactivity can be determined, as well as the dose-response kinetics. Thisassay could also be used to identify molecules which enhance PRTSactivity.

[0361] In the alternative, phage display libraries can be used to screenfor peptide PRTS inhibitors. Candidates are found among peptides whichbind tightly to a protease. In this case, multi-well plate wells arecoated with PRTS and incubated with a random peptide phage displaylibrary or a cyclic peptide library (Koivunen, E. et al. (1999) NatureBiotech 17:768-774). Unbound phage are washed away and selected phageamplified and rescreened for several more rounds. Candidates are testedfor PRTS inhibitory activity using an assay described in Example XVII.

[0362] Various modifications and variations of the described methods andsystems of the invention will be apparent to those skilled in the artwithout departing from the scope and spirit of the invention. Althoughthe invention has been described in connection with certain embodiments,it should be understood that the invention as claimed should not beunduly limited to such specific embodiments. Indeed, variousmodifications of the described modes for carrying out the inventionwhich are obvious to those skilled in molecular biology or relatedfields are intended to be within the scope of the following claims.TABLE 1 Incyte Poly- Incyte Poly- Project peptide Polypep- nucleotideIncyte Poly- ID SEQ ID NO: tide ID SEQ ID NO: nucleotide ID 5155802 15155802CD1 22 5155802CB1 71269782 2 71269782CD1 23 71269782CB1 7472651 37472651CD1 24 7472651CB1 7478251 4 7478251CD1 25 7478251CB1 2759385 52759385CD1 26 2759385CB1 4226182 6 4226182CD1 27 4226182CB1 5078962 75078962CD1 28 5078962CB1 7474340 8 7474340CD1 29 7474340CB1 7477287 97477287CD1 30 7477287CB1 2994162 10 2994162CD1 31 2994162CB1 3965293 113965293CD1 32 3965293CB1 4948403 12 4948403CD1 33 4948403CB1 7473165 137473165CD1 34 7473165CB1 7476667 14 7476667CD1 35 7476667CB1 7479166 157479166CD1 36 7479166CB1 3671788 16 3671788CD1 37 3671788CB1 7479181 177479181CD1 38 7479181CB1 6621372 18 6621372CD1 39 6621372CB1 4847254 194847254CD1 40 4847254CB1 5776350 20 5776350CD1 41 5776350CB1 7473300 217473300CD1 42 7473300CB1

[0363] TABLE 2 Poly- Incyte GenBank peptide Poly- ID NO: or Proba- SEQpeptide PROTEOME bility ID NO: ID ID NO: Score Annotation 1 5155802CD1g7684607 0.0E+00 [f1][Homo sapiens] calpain 3; calcium activated neutralprotease; CAPN3; CL1 Weilbach, F. X. et al. (1999) Nervenarzt 70:89-100;Piechaczyk, M. Methods Mol Biol (2000) 144:297-307 2 71269782CD1g4539525 9.0E−45 [f1][Homo sapiens] NAALADase II protein Pangalos, M. N.et al. (1999) J. Biol. Chem. 274:8470-8483 3 7472651CD1 g112447591.0E−144 [f1][Homo sapiens] ACO protease g3649791 3.7E−67 [Homo sapiens]serine protease (TLSP) Yoshida, S. et al. (1998) Biochim. Biophys. Acta1399:225-228 4 7478251CD1 g3386523 1.0E−101 [f1][Homo sapiens]evolutionarily related interleukin-1beta converting enzyme Humke, E. W.,Ni, J. and Dixit, V. M. (1998) J. Biol. Chem. 273:15702-15707 52759385CD1 g3220154 0.0E+00 [5′ incom][Homo sapiens] ubiquitinhydrolyzing enzyme I 6 4226182CD1 g1235672 1.0E−61 [f1][Homo sapiens]metalloprotease/ disintegrin/cysteine-rich protein precursor Weskamp, G.et al. (1996) J. Cell. Biol. 132:717-726 7 5078962CD1 g6469251 9.8E−51[Streptomyces coelicolor A3(2)] methionine aminopeptidase (EC 87474340CD1 g13429970 0.0E+00 [f1][Homo sapiens] membrane-type mosaicserine protease g6648960 1.9E−38 [Mus musculus] mosaic serine proteaseepitheliasin Jacquinet, E. et al. (2000) FEBS Lett. 468:93-100 97477287CD1 g9798662 1.0E−131 [f1][Suncus murinus] pepsinogen C g70080232.1E−119 [Callithrix jacchus] pepsinogen C Kageyama, T. (2000) J.Biochem. 127:761-770 10 2994162CD1 g9581879 0.0E+00 [f1][Homo sapiens]disintegrin metalloproteinase with thrombospondin repeats g49294781.2E−195 [Rattus norvegicus] a disintegrin and metalloproteinase withthrombospondin 11 3965293CD1 g2739433 9.0E−78 [f1][Mus musculus]hematopoietic- specific IL-2 deubiquitinating enzyme Zhu, Y. et al.(1997) J. Biol. Chem. 272:51-57 12 4948403CD1 g9651704 1.0E−168[f1][Homo sapiens] carboxypeptidase B precursor g203295 4.8E−97 [Rattusnorvegicus] carboxypeptidase B 13 7473165CD1 g6467401 0.0E+00 [Musmusculus] soluble secreted endopeptidase delta Ikeda, K. et al. (1999)J. Biol. Chem. 274:32469-32477 14 7476667CD1 g13560797 0.0E+00 [f1][Homosapiens] ubiquitin specific protease g2655204 2.3E−30 [Mus musculus]ubiquitin-specific protease 15 7479166CD1 g200507 1.7E−60 [Mus musculus]protease-6 Serafin, W. E. et al. (1991) J. Biol. Chem. 266:3847- 163671788CD1 g10303331 0.0E+00 [f1][Mus musculus] calpain 12 g25701584.9E−136 [Mus musculus] m-calpain large subunit Muta, T. et al. (1991)J. Biol. Chem. 266:3554-6561 17 7479181CD1 g217397 5.1E−53 [Tachypleustridentatus] limulus factor C precursor 18 6621372CD1 g6651071 0.0E+00[5′ incom][Homo sapiens] disintegrin and metalloproteinase domain 19Kurisaki, T. et al. (1998) Mech. Dev. 73:211-215 19 4847254CD1 g103033292.0E−76 [f1][Mus musculus] calpain 12 20 5776350CD1 g7673618 0.0E+00 [5′incom] [Mus musculus] ubiquitin specific protease 21 7473300CD1 g3037041.0E−06 [f1][Mus musculus] P100 serine protease of Ra-reactive factor(RaRF)

[0364] TABLE 3 Incyte Amino Potential Potential Signature Analytical SEQPoly- Acid Phosphory- Glycosy- Sequences, Methods ID peptide Resi-lation lation Domains and NO: ID dues Sites Sites and Motifs Databases 15155802CD1 767 S154 S320 S322 N117 N223 signal_cleavage: M1-A15 SPSCANS329 S352 S375 N318 N367 S384 S496 S511 N480 N531 S527 S552 S557 S590S642 S655 S90 T13 T291 T361 T574 CALPAIN CATALYTIC DOMAIN BLAST_DOMODM01305|P20807| 19-581: T268-E534, S19-D294 CALPAIN CATALYTIC DOMAINBLAST_DOMO DM01305|S57196| 12-574: T268-E534, G21-Y272 CALPAIN CATALYTICDOMAIN BLAST_DOMO DM01305|P00789| 3-507: F61-R530 CALPAIN CATALYTICDOMAIN BLAST_DOMO DM01305|P07384| 11-517: F61-K529 PROTEASE CALPAINHYDROLASE BLAST_PRODOM SUBUNIT NEUTRAL THIOL LARGE CALCIUM ACTIVATEDPROTEINASE CANP PD001545: L74-T369 PROTEASE CALPAIN HYDROLASEBLAST_PRODOM SUBUNIT LARGE NEUTRAL THIOL CALCIUM ACTIVATED PROTEINASECANP PD001874: W381-E534 CALPAIN SUBUNIT PROTEASE BLAST_PRODOM NEUTRALCALCIUM BINDING CALCIUM ACTIVATED PROTEINASE CANP HYDROLASE LARGEPD002827: L666-I729 CALPAIN SUBUNIT CALCIUM BLAST_PRODOM BINDING NEUTRALPROTEASE CALCIUM ACTIVATED PROTEINASE CANP HYDROLASE LARGE PD003609:E595-F663 EF-hand calcium-binding BLIMPS_BLOCKS domain protein BL00018:D651-F663 Calpain cysteine protease BLIMPS_PRINTS (C2) family signaturePR00704: K59-P82, W99-I121, Q123-T139, Y159-T184, L189-L212, G214-I241,E345-C366, S395-Y412, R500-E528 Calpain family cysteine HMMER_PFAMprotease Peptidase_C2: L74-T369 EF hand: S642-I670, HMMER_PFAM A672-A700Calpain large subunit, HMMER_PFAM domain III Calpain_III: T380-E534,EF-Hand calcium binding MOTIFS domain:; D651-F663, D681-M693 EukaryoticThiol (cysteine) MOTIFS Proteases Active site: Q123-A134 2 71269782CD1574 S117 S180 S197 N10 N216 PROTEIN AMINOPEPTIDASE BLAST_PRODOM S255S267 S315 N295 ANTIGEN RECEPTOR S362 S366 S393 N373 N534 TRANSMEMBRANEMEMBRANE S404 S59 S92 CARBOXYPEPTIDASE TRANSFERRIN T271 T398 T44HYDROLASE T440 Y106 PROSTATE SPECIFIC PD001808: N410-T556, K179-S218transmembrane domain: HMMER I128-V146 3 7472651CD1 320 S166 S211 S220N235 N296 trypsin: L86-I313 HMMER_PFAM S226 S288 T153 T242 T297 Serineproteases, trypsin MOTIFS family active sites:; Trypsin_Histidine:L122-C127 TRYPSIN DM00018|P12788| BLAST_DOMO 23-243: K85-M317 TRYPSINDM00018|P00764| BLAST_DOMO 8-225: L86-M317 TRYPSIN DM00018|P35031|BLAST_DOMO 20-238: K85-M317 TRYPSIN DM00018|S49489| BLAST_DOMO 21-238:L86-M317 PROTEASE SERINE PRECURSOR BLAST_PRODOM SIGNAL HYDROLASE ZYMOGENGLYCOPROTEIN FAMILY MULTIGENE FACTOR PD000046: R133-I313, L86-Y248Serine proteases, trypsin BLIMPS_BLOCKS family, histidine proteinsBL00134: C111-C127, E267-G290, P300-I313 Type I fibronectin domainBLIMPS_BLOCKS proteins BL01253: C111-A124, A266-C279 Kringle domainproteins. BLIMPS_BLOCKS BL00021: C111-Q128 Chymotrypsin serine proteaseBLIMPS_PRINTS family (S1) signature PR00722: G112-C127, S166-A180,A266-V278 Serine proteases, trypsin PROFILESCAN family, active sitesfor: Trypsin_Histidine: L103-P147; Trypsin-Serine: L252-D295 47478251CD1 378 S102 S154 S244 N152 N177 Caspase recruitment domainHMMER_PFAM S271 S313 S52 N311 N319 CARD: A2-S91 S79 T118 T134 T179 T20T232 Y125 Y147 Y170 ICE-like protease (caspase) HMMER_PFAM p20 domainICE_p20: K131- I264 ICE-like protease (caspase) HMMER_PFAM p10 domainICE_p10: A291- P376, INTERLEUKIN-1 BETA BLAST_DOMO CONVERTING ENZYMEFAMILY HISTIDINE DM01067| P49662|97-280: Q97- W266 INTERLEUKIN-1 BETABLAST_DOMO CONVERTING ENZYME FAMILY HISTIDINE DM01067| B57511|138-321:Q97-W266 INTERLEUKIN-1 BETA BLAST_DOMO CONVERTING ENZYME FAMILYHISTIDINE DM01067| P51878|138-321: Q97-W266 INTERLEUKIN-1 BETABLAST_DOMO CONVERTING ENZYME FAMILY HISTIDINE DM01067| P29466|124-307:G103-G275 PRECURSOR PROTEASE HYDROLASE BLAST_PRODOM THIOL ZYMOGENAPOPTOSIS PROTEIN APOPTOTIC CASPASE1 CYSTEINE PD001408: K131-N260CASPASE12 PRECURSOR EC 3.4.22. BLAST_PRODOM HYDROLASE THIOL PROTEASEAPOPTOSIS ZYMOGEN PD103766: V11-K131 Caspase family histidineBLIMPS_BLOCKS proteins BL01121: L148-M183, E195-S210, C242-G259,K294-I328, L340-E352 INTERLEUKIN-1B CONVERTING BLIMPS_PRINTS ENZYMESIGNATURE PR00376: R133-N146, R151-G169, G169-L187, T202-S210,C242-N260, S313-I324, L366-F375 Caspase family active site: MOTIFSIce_Serine: K248-G259 5 2759385CD1 366 S17 S189 S190 N15 N178 Ubiquitincarboxyl-terminal HMMER_PFAM S216 S234 S271 N205 hydrolase family 1UCH-1: T131 T2 T285 N284 F35-Y66 T89 Y358 Ubiquitin carboxyl-terminalHMMER_PFAM hydrolase family 2 UCH-2: L292-S364 UBIQUITINCARBOXYL-TERMINAL BLAST_DOMO HYDROLASESFAMILY 2 DM00659| P39967|359-610:K72-G306 UBIQUITIN CARBOXYL-TERMINAL BLAST_DOMO HYDROLASES FAMILY 2DM00659| P40818|782-1103: G41-E341 PROTEASE UBIQUITIN HYDROLASEBLAST_PRODOM ENZYME UBIQUITINSPECIFIC CARBOXYLTERMINAL DEUBIQUITINATINGTHIOLESTERASE PROCESSING CONJUGATION PD000590: G36-S189 PROTEASEUBIQUITIN HYDROLASE BLAST_PRODOM UBIQUITINSPECIFIC ENZYMEDEUBIQUITINATING CARBOXYLTERMINAL THIOLESTERASE PROCESSING CONJUGATIONPD017412: S190-L282 Ubiquitin carboxyl-terminal BLIMPS_BLOCKS hydrolasefamily 2 BL00972: G36-L53, Y116-L125, V168-C182, Y296-S320, H321-E342Ubiquitin carboxyl-terminal MOTIFS hydrolase family 2 signatures;Uch_2_1: G36-Q51; Uch_2_2: Y296-Y314 6 4226182CD1 389 S138 S140 S215N213 N80 Disintegrin signature HMMER_PFAM S285 S291 S32 disintegrin:A22-C86 S337 S350 S369 S61 S82 S97 T173 T204 T363 T373 do ZINC;REGULATED; EPIDIDYMAL; BLAST_DOMO NEUTRAL; DM00591|S47656| 462-624:C79-A210 TRANSMEMBRANE METALLOPROTEASE BLAST_PRODOM SIGNAL PRECURSORPROTEIN GLYCOPROTEIN CELL FERTILIN BETA ADHESION PD001269: N94-I163 CELLADHESION PLATELET BLOOD BLAST_PRODOM COAGULATION VENOM DISINTEGRINMETALLOPROTEASE PRECURSOR SIGNAL PD000664: C28-C86 DISINTEGRIN SIGNATUREPR00289: BLIMPS_PRINTS C47-R66, E77-D89 transmembrane domain: HMMERW298-A318 Disintegrins signature PROFILESCAN disintegrins.prf: G8-D89 75078962CD1 217 T2 T203 N151 metallopeptidase family M24 HMMER_PFAMPeptidase_M24: M1-Q208 AMINOPEPTIDASE HYDROLASE BLAST_PRODOM METHIONINEPEPTIDASE PROTEIN COBALT M DIPEPTIDASE XPRO MAP PD000555: E4-D181Aminopeptidase P and proline BLIMPS_BLOCKS dipeptidase proteinsBL00491C: M157-E171 Methionine aminopeptidase BLIMPS_BLOCKS subfamily 1BL00680: D55-F76 METHIONINE AMINOPEPTIDASE-1 BLIMPS_PRINTS SIGNATUREPR00599: V33-P46, D55-D71, F125-G137, L155-P167 METHIONINEAMINOPEPTIDASE BLAST_DOMO DM01530|Q01662| 123-375: M1-T211 Methionineaminopeptidase PROFILESCAN signature map.prf: I112-I168 8 7474340CD1 486S101 S252 S254 N250 N287 Trypsin family active site HMMER_PFAM S301 S391S96 N400 trypsin: I321-H438 T153 T289 T318 T349 T402 T428 TRYPSINDM00018|P26262| BLAST_DOMO 391-624: I321-P429 TRANSMEMBRANE PROTEASE,SERINE BLAST_PRODOM 2 EC 3.4.21. HYDROLASE PROTEASE SIGNALANCHORPD072395: P86-R320 PROTEASE SERINE PRECURSOR BLAST_PRODOM SIGNALHYDROLASE ZYMOGEN GLYCOPROTEIN FAMILY MULTIGENE FACTOR PD000046:I321-S463 Serine proteases, trypsin BLIMPS_BLOCKS family BL00134A:C346-C362 Kringle domain proteins BLIMPS_BLOCKS BL00021B: C346-F363CHYMOTRYPSIN SERINE PROTEASE BLIMPS_PRINTS PR00722: E405-L419, G347-C362transmembrane domain: HMMER L163-W184 Trypsin family active site MOTIFSTrypsin_His: L357-C362 Serine proteases, trypsin PROFILESCAN family,active sites trypsin_his.prf: W334-A389 9 7477287CD1 390 S164 S175 S27N311 Eukaryotic aspartyl protease HMMER_PFAM S375 T123 asp: P65-V89,P101-S389 EUKARYOTIC AND VIRAL ASPARTYL BLAST_DOMO PROTEASESDM00126|P20142| 17-386: R19-A387 PROTEASE ASPARTYL HYDROLASEBLAST_PRODOM PRECURSOR SIGNAL ZYMOGEN GLYCOPROTEIN ASPARTIC PROTEINASEMULTIGENE PD000182: P65-A387 Eukaryotic and viral aspartyl BLIMPS_BLOCKSprotease BL00141: D178- A189, G230-G239, I364-A387 PEPSIN (A1) ASPARTICPROTEASE BLIMPS_PRINTS PR00792: T80-L100, G225-s238, A275-V286,2363-D378 Signal_peptide HMMER 10 2994162CD1 1916 S122 S171 S27 N116N252 Reprolysin (M12B) family zinc HMMER_PFAM S400 S460 S59 N730 N821metalloprotease Reprolysin: S732 S781 S782 N93 N1194 R274-P480 S811 S924S947 N1248 S968 T139 T156 N1769 T199 T220 T25 N1787 T262 T266 T344 T370T391 T53 T545 T758 T771 T815 T823 T893 T914 T953 T998 T1155 T1159 T1008T1019 S1122 S1189 S1196 S1257 T1267 S1329 T1343 S1393 S1455 T1509 T1522S1526 T1539 T1551 S1579 S1619 T1625 T1661 T1687 T1707 S1789 T1840 S1865S1869 T1909 Y164 Y1263 Y1521 Reprolysin family propeptide HMMER_PFAMPep_M12B_propep: N93-R223 Thrombospondin type 1 domain; HMMER_PFAMtsp_1: G570-C623, W1313- C1364, W1426-C1479 Neutral zincmetallopeptidase BLIMPS_BLOCKS BL00142: T412-N422 do ZINC;METALLOPEPTIDASE; BLAST_DOMO NEUTRAL; ATROLYSIN; DM00368|S60257|204-414: L270-E481 METALLOPROTEASE PRECURSOR BLAST_PRODOMHYDROLASE SIGNAL ZINC VENOM CELL PROTEIN TRANSMEMBRANE ADHESIONPD000791: L270-P480 PROTEIN PROCOLLAGEN BLAST_PRODOM THROMBOSPONDINMOTIFS NPROTEINASE A DISINTEGRIN METALLOPROTEASE WITH ADAMTS1; PD014161:K734-I851; PD011654: I661-C733 Zinc_Protease: T412-F421 MOTIFS 113965293CD1 314 S22 S23 S272 N92 Ubiquitin carboxyl-terminal HMMER_PFAMS284 S294 S311 hydrolases family 1 UCH-1: S36 S71 S72 A80-R111 T47Ubiquitin carboxyl-terminal BLIMPS_BLOCKS hydrolases family 2 BL00972:G81-L98, G156-L165, I193-C207 UBIQUITIN CARBOXYL-TERMINAL BLAST_DOMOHYDROLASES FAMILY 2; DM00659| P50102|141-420: Q158-F283; DM00659|Q09738|149-388: N84-F283 PROTEASE UBIQUITIN HYDROLASE BLAST_PRODOMENZYME UBIQUITINSPECIFIC CARBOXYLTERMINAL DEUBIQUITINATING THIOLESTERASEPROCESSING CONJUGATION; PD000590: L62-H120, F153-T216; PD017412:F217-F283 Ubiquitin carboxyl-terminal MOTIFS hydrolases family 2Uch_2_1: G81-Q96 12 4948403CD1 437 S141 S299 S335 N153 N427 Zinccarboxypeptidase HMMER_PFAM S381 S60 T124 N89 Zn_carbOpept: Y139-E420T216 T417 T49 T80 Y352 Y54 Zinc carboxypeptidases, zinc- BLIMPS_BLOCKSbinding regions BL00132: Y139-L179, R187-W200, Y217-R257, S261-K275,P287-H313, H316-K337, T373-G390 ZINC CARBOXYPEPTIDASES, ZINC- BLAST_DOMOBINDING REGION 1 DM00683| P19223|107-414: S132-L432 CARBOXYPEPTIDASEPRECURSOR BLAST_PRODOM SIGNAL HYDROLASE ZINC ZYMOGEN PROTEINGP180CARBOXYPEPTIDASE PD001916: Y139-F344, CARBOXYPEPTIDASE ABLIMPS_PRINTS METALLOPROTEASE FAMILY SIGNATURE PR00765: I165-L177,R187-I201, G267-K275, L321-Y334 Carboxypept_Zn_2: H324-Y334 MOTIFS Zinccarboxypeptidases, zinc- PROFILESCAN binding regions signaturescarboxypept_zn_2.prf: E302-L358 signal_cleavage: M1-S30 SPSCAN 137473165CD1 742 S102 S144 S151 N121 N142 Peptidase family M13 HMMER_PFAMS209 S234 S326 N172 N208 Peptidase_M13: N535-V741 S356 S377 S410 N315N494 S431 S457 S467 N601 N620 S515 S689 S698 T123 T394 T446 T636 Y407Y490 Neutral zinc metallopeptidases BLIMPS_BLOCKS BL00142: V573-D583NEPRILYSIN DM02569|P08473| BLAST_DOMO 11-748: L20-W742 PROTEIN ZINCMETALLOPROTEASE BLAST_PRODOM HYDROLASE TRANSMEMBRANE GLYCOPROTEINSIGNALANCHOR ENDOPEPTIDASE NEUTRAL ENZYME; PD001606: E240-P692;PD002031: A62-F245 NEPRILYSIN METALLOPROTEASES BLIMPS_PRINTS PR00786:L527-S539, I545-F557, N566-F582, E639-A650 Zinc_Protease: V573-F582MOTIFS transmem domain: L20-Y38 HMMER signal_cleavage: M1-V32 SPSCAN 147476667CD1 582 S203 S222 S273 N32 N468 Ubiquitin carboxyl-terminalHMMER_PFAM S328 S350 S357 N520 hydrolase family 2 UCH-2: S358 S367 S376I484-Q544 S400 S432 S44 S470 S474 S523 S565 S566 S71 T134 T188 T221 T244T29 T438 T6 T91 Ubiquitin carboxyl-terminal BLIMPS_BLOCKS hydrolasefamily BL00972: I487-N511, N513-T534 do UBIQUITIN; TRANSFORMING;BLAST_DOMO HYDROLASE; TERMINAL; DM08764| P35125|548-820: L45-R318UBIQUITIN CARBOXYL-TERMINAL BLAST_DOMO HYDROLASES FAMILY 2; DM00659|P40818|782-1103: A206-D294, Y488-L540; DM00521|P35125| 1007-1051:L500-Q545 UBIQUITIN CARBOXYLTERMINAL BLAST_PRODOM HYDROLASE 6THIOLESTERASE UBIQUITINSPECIFIC PROCESSING PROTEASE DEUBIQUITINATINGENZYME PROTOONCOGENE TRE2 CONJUGATION THIOL MULTIGENE FAMILY; PD085597:R378-I487; PD038816: I55-S203; PD119604: M1-I54; PD085589: C524-Q582Uch_2_2 Y488-Y505 MOTIFS 15 7479166CD1 290 S250 S54 S91 N150 N209Trypsin active sites trypsin: HMMER_PFAM T264 Y133 I75-S177, P186-I282Serine proteases, trypsin BLIMPS_BLOCKS family BL00134: C106-C122,D233-V256, P269-I282 Type I fibronectin domain BLIMPS_BLOCKS BL01253:C106-A119, R232-C245, W251-Q285 Kringle domain proteins BL00021:BLIMPS_BLOCKS C106-I123, G241-I282 CHYMOTRYPSIN SERINE PROTEASESBLIMPS_PRINTS PR00722: G107- C122, G164-P178, R232-V244 TRYPSINDM00018|P21845| BLAST_DOMO 31-271: G74-P186, E182-V286 PROTEASE SERINEPRECURSOR BLAST_PRODOM SIGNAL HYDROLASE ZYMOGEN GLYCOPROTEIN FAMILYMULTIGENE FACTOR PD000046: P187-I282, I75-S180 Trypsin family activesites: MOTIFS Trypsin_His: L117-C122; Trypsin_Ser: D233-V244 Serineproteases, trypsin PROFILESCAN family, active sites trypsin_his.prf:A103-G147; trypsin_ser.prf: I220-L265 signal_cleavage: M1-A60 SPSCAN 163671788CD1 708 S244 S488 S5 N556 Calpain family cysteine HMMER_PFAM S67S93 T266 protease Peptidase_C2: T388 T421 T459 L45-S341 T461 T492 T577Calpain large subunit, HMMER_PFAM domain III Calpain_III: G353-A499CALPAIN CYSTEINE PROTEASE BLIMPS_PRINTS PR00704: Q30-A53, W75-V97,Q99-T115, Y135-V160, L165-L188, G190-L217, E317-C338, N368-F385 PROTEASECALPAIN HYDROLASE BLAST_PRODOM SUBUNIT NEUTRAL THIOL LARGECALCIUMACTIVATED PROTEINASE PD001545: L45-S341; PD002827: L607-V670;PD001874: W354-E401, C424-Y491 CALPAIN CATALYTIC DOMAIN; BLAST_DOMODM01305|P17655|1- 505: D14-G402, C424-N463 CALPAIN CATALYTIC DOMAIN;BLAST_DOMO DM01305|A48764|1- 507: M1-G402, G418-Q454 Cysteine proteaseactive site MOTIFS Thiol_Protease_Cys: Q99-A110 EF hand calcium bindingdomain; MOTIFS Ef_Hand: D622-L634 17 7479181CD1 649 S257 S353 S354 N380N543 Trypsin active site trypsin: HMMER_PFAM S365 S402 S502 N96W391-I644 S519 S552 S571 S627 S93 T102 T318 T361 T545 T86 CUB domainCUB: C128-Y233 HMMER_PFAM EGF-like domain; EGF: HMMER_PFAM C239-C271Serine proteases, trypsin PROFILESCAN family, active sitestrypsin_his.prf: K411-E464 Serine proteases, trypsin BLIMPS_BLOCKSfamily (p < 0.0012); BL00134: C418-C434, S631-I644 CUB domain proteins;BL01180B: BLIMPS_BLOCKS C177-G187 (p < 0.13) Kringle domain proteins;BLIMPS_BLOCKS BL00021B: C418-V435 (p < 0.087) Type II EGF-likesignature; BLIMPS_PRINTS PR00010: E235-H246, G256-Y266, T267-N273CHYMOTRYPSIN SERINE PROTEASE; BLIMPS_PRINTS PR00722: S419- C434,L485-A499 Sushi domain proteins (Short BLIMPS_PFAM consensus repeat)PF00084: H336-F347, G362-C371 TRYPSIN DM00018|P28175| BLAST_DOMO759-1018: R390-R646 PROTEASE SERINE PRECURSOR BLAST_PRODOM SIGNALHYDROLASE ZYMOGEN GLYCOPROTEIN FAMILY MULTIGENE FACTOR PD000046:W391-I644 signal_peptide: M1-A32 HMMER EGF-like domain; Egf: MOTIFSC260-C271 18 6621372CD1 918 S208 S284 S364 N144 N444 Reprolysin (M12B)family zinc HMMER_PFAM S38 S647 S787 N447 N645 metallopeptidaseReprolysin: S823 S830 S831 K210-P408 S90 S907 S915 T105 T106 T118 T131T182 T194 T449 T488 T504 T520 Reprolysin family propeptide; HMMER_PFAMPep_M12B_propep: D79- K195 Disintegrin signature; HMMER_PFAMdisintegrin: E425-Q500 Disintegrin signature; PROFILESCANdisintegrins.prf: E436-P495 Neutral zinc metallopeptidases, PROFILESCANzinc-binding region signature zinc_protease.prf: S325-G377 Neutral zincmetallopeptidases; BLIMPS_BLOCKS BL00142: T342-G352 DISINTEGRINSIGNATURE; PR00289: BLIMPS_PRINTS C456-R475, E485-N497 NEPRILYSINMETALLOPROTEASE; BLIMPS_PRINTS PR00786C: N335-F351 MELTRIN, BETABLAST_PRODOM METALLOPROTEASEDISINTEGRIN MELTRIN BETA INTEGRIN PROTEASEMETALLOPROTEASE PD105322: P696-G888; PD171676: K571-C643 METALLOPROTEASEPRECURSOR BLAST_PRODOM HYDROLASE SIGNAL ZINC VENOM CELL PROTEINTRANSMEMBRANE ADHESION PD000791: K210-P408 CELL ADHESION PLATELET BLOODBLAST_PRODOM COAGULATION VENOM DISINTEGRIN METALLOPROTEASE PRECURSORSIGNAL PD000664: E425-Y499 do ZINC; METALLOPEPTIDASE; BLAST_DOMONEUTRAL; ATROLYSIN; DM00368| S60257|204-414: K202-D409 do ZINC;REGULATED; EPIDIDYMAL; BLAST_DOMO NEUTRAL; DM00591|S60257| 492-628:F486-L625 Zinc_Protease: T342-F351 MOTIFS transmembrane domain: HMMERV700-Y721 signal_cleavage: M1-P22 SPSCAN 19 4847254CD1 218 T164 T207 T49N28 CALPAIN CATALYTIC DOMAIN; BLAST_DOMO DM01221|P20807|719- 819:L117-F217; DM01221| S57196|708-808: L117-F217; DM01221| P00789|602-702:L117-M212; DM01221| P07384| 612-712: L117-F217 CALPAIN SUBUNIT PROTEASEBLAST_PRODOM NEUTRAL CALCIUMBINDING CALCIUMACTIVATED PROTEINASE CANPHYDROLASE LARGE PD002827: L117-V180 Calcium binding domain MOTIFSEf_Hand: D132-L144 Calcium binding domain HMMER_PFAM efhand: E123-A151signal_cleavage: M1-T47 SPSCAN 20 5776350CD1 656 S141 S145 S22 N16Ubiquitin carboxyl-terminal HMMER_PFAM S272 S279 S301 hydrolase family1; UCH-1: S338 S410 S483 R308-D339 S493 S510 S520 S524 S572 S624 S95 S99T107 T171 T204 T260 T451 T502 T529 Ubiquitin carboxyl-terminalHMMER_PFAM hydrolase family 2; UCH-2: N590-K650 Ubiquitincarboxyl-terminal BLIMPS_BLOCKS hydrolase family 2; BL00972: G309-L326,Y390-L399, I429-C443, K593-Q617, K619-Y640 UBIQUITIN CARBOXYL-TERMINALBLAST_DOMO HYDROLASES FAMILY 2; DM00659| P40818|782-1103: S493-L646,L313-N421, I428-L463 PROTEASE UBIQUITIN HYDROLASE BLAST_PRODOMUBIQUITINSPECIFIC ENZYME DEUBIQUITINATING CARBOXYLTERMINAL THIOLESTERASEPROCESSING CONJUGATION PD017412: S493-P583 Ubiquitin carboxyl-terminalMOTIFS hydrolase family 1 Uch_2_1: G309-Q324 Ubiquitin carboxyl-terminalMOTIFS hydrolase family 2 Uch_2_2: Y594-Y611 21 7473300CD1 509 S137 S156S488 N253 N33 Trypsin family serine protease HMMER_PFAM T130 T163 T32N394 active site; trypsin: K279- T37 T41 Y286 F358 Trypsin family serineprotease PROFILESCAN active site; trypsin_his.prf: I297-P343 Trypsinfamily serine protease BLIMPS_BLOCKS active site; BL00134A: C305- C321Kringle domain proteins BL00021B: BLIMPS_BLOCKS C305-V322 CHYMOTRYPSINSERINE PROTEASE BLIMPS_PRINTS ACTIVE SITE; PR00722A: S306-C321 Trypsinfamily serine protease MOTIFS active site Trypsin_His: L316-C321

[0365] TABLE 4 Polynu- cleotide SEQ ID Incyte Sequence Selected Sequence5′ 3′ NO: ID Length Fragments Fragments Position Position 22 5155802CB12789 1-1939 71666762V1 1728 2444 71668725V1 1024 1733 8001825H1 1 383(LNODTUC02) 71668385V1 248 960 8089190H1 2133 2789 (BRACDIK08)71667190V1 928 1658 70239197V1 1712 2237 70235564V1 472 1009 2371269782CB1 2267 1701-2267 70900108V1 586 1167 70899845V1 1716 223471269782V1 1286 1963 GBI.g8567524_edit 1142 2267 2779031F6 10 573(OVARTUT03) g7377067 1 396 70899669V1 360 977 71874795V1 1142 1708 247472651CB1 963 720-801, FL7472651_(—) 1 963 1-665, g7689999_(—) 838-912000022__g3649791 25 7478251CB1 1137 1-489, 72001656V1 779 1137 779-876g8117619_edit_1 1 80 g8117619_edit_2 256 778 72004235V1 3 261 262759385CB1 3204 2123-2558, 6983266H1 845 1382 1-72, (BRAIFER05) 505-529,3127-3204 1275720T6 2453 3117 (TESTTUT02) 2759385F6 1329 1748(THP1AZS08) 7168141H1 413 929 (MCLRNOC01) 3690313F6 1 475 (HEAANOT01)2732484H1 2934 3126 (OVARTUT04) 7380327H1 1570 2128 (ENDMUNE01) 647852H1686 948 (CARCTXT02) 4520886H1 2950 3204 (SINJNOT03) 659258R6 2006 2493(BRAINOT03) 6263739H1 2220 2556 (MCLDTXN03) 2759385R6 949 1563(THP1AZS08) 27 4226182CB1 1641 1-696 645682T6 984 1631 (BRSTTUT02)5015693F6 372 1008 (BRAXNOT03) 55062402J1 1 545 71975126V1 655 1054645682F1 1068 1641 (BRSTTUT02) 28 5078962CB1 1983 1-319, 2937276F6 7801385 1809-1983 (THYMFET02) 55058283J2 1 761 6473257H1 1050 1723(PLACFEB01) 8118369H1 686 1341 (TONSDIC01) 6508675H1 1477 1983(BRAHNOT02) 29 7474340CB1 1574 1-37 5558974T9 829 1350 (TONSDIT01)55068051J1 426 1098 g2056077 1134 1574 55068054J1 1 602 30 7477287CB11173 1-732, g8546678_edit_01 1 100 1112-1173, 834-1071 g8546678_edit_02225 1173 825016H1_edit_1 55 224 (PROSNOT06) 31 2994162CB1 60135667-6013, 3071581H1 3391 3614 2770-4197, (UTRSNOR01) 683-2187, 1-103,219-247 7122715H1 2267 2792 (BRAHNOE01) 71229995V1 5281 5880 7992663H14366 5039 (UTRSDIC01) 6177981F6 145 777 (BMARUNT02) 70867656V1 5401 60136706152H1 4733 5393 (HEAADIR01) 496053H1 2881 3243 (HNT2NOT01)g7242978_CD 433 4914 5301201H1 2035 2300 (MUSCNOT11) 7407622H1 405 952(UTREDME05) 7606552H1 3834 4394 (COLRTUE01) 7272409H1 3571 4162(OVARDIJ01) 7090903F6 1153 1733 (BRAUTDR03) 7100145R6 1248 2171(BRAWTDR02) 55062765H1 1 245 7100145F6 798 1652 (BRAWTDR02) 7728093J12412 3040 (UTRCDIE01) 32 3965293CB1 1393 397-1002 3965293F6 1 858(PROSNOT14) 71832720V1 651 1393 33 4948403CB1 1993 1654-1687, 4600759H11025 1282 1-123, (COLSTUT01) 850-1300 71982269V1 1420 1993 5763587T7 6571179 (PROSBPT02) 70484250V1 1180 1790 GBI.g8080699_000017_(—) 528 974000013.edit 5763587F7 1 473 (PROSBPT02) 7930210H1 116 619 (COLNDIS02) 347473165CB1 2318 1-1362, 2250635H1 2193 2318 1756-2138 (OVARTUT01)GBI.g9367391_(—) 1848 2318 000005_(—) 000006.edit FL7473165- 1020 1259g7329540_(—) 000015- g6467401 55072914H1 272 891 55073757J1 1 46555062846H1 452 1124 GBI:g8039388_(—) 1161 1982 000002.edit 35 7476667CB11931 1909-1931 337733R6 1418 1931 (EOSIHET02) 1608234T6 1301 1930(LUNGNOT15) 71729901V1 678 1385 71734439V1 608 1345 55027506H1 1 687(ADMEDNV30) 36 7479166CB1 1218 1-299, g4394411 764 1218 369-666,1020-1057, 739-762 GNN.g7635593_(—) 1 873 000002_006 37 3671788CB1 26791-1760 72038124V1 1950 2679 6198936H1 1721 2372 (PITUNON01) 3671788T7348 864 (KIDNTUT16) 6431661H1 1792 2390 (LUNGNON07) 526464H1 1680 1777(EOSINOT02) 37 GBI.g8576128_(—) 131 2257 000022_(—) 000025.edit2579533T6 1 439 (KIDNTUT13) 7729129H1 586 1196 (UTRCDIE01) 38 7479181CB12632 1-1603 1681388F7 2423 2632 (STOMFET01) 8113752H1 1 515 (OSTEUNC01)71510880V1 1282 2009 70737244V1 448 1028 71509933V1 1892 2626 7245927H12037 2628 (PROSTMY01) 70733946V1 575 1238 71511332V1 1216 1920 396621372CB1 2757 2517-2757, 7715927J1 781 1531 430-1288 (SINTFEE02)5456122H1 2606 2757 (SINITUT03) 6887315F6 1700 2324 (BRAITDR03)7372052H2 2235 2722 (BRAIFEE04) g6651070_CD 293 2705 GBI.g7709272_(—) 12757 g6651070_(—) g7709257_edit 7723192J2 1096 1691 (THYRDIE01)8037549H1 397 1010 (SMCRUNE01) 8037549J1 1647 2311 (SMCRUNE01) 404847254CB1 1892 1-764, 4847254F8 529 1173 1773-1892, (SPLNTUT02)918-1029 GBI.g8576128.edit 1 769 72038106V1 951 1892 41 5776350CB1 31721036-1253, 71397725V1 1638 2301 747-802, 82-257, 2389-3172, 1401-14377741938H1 496 913 (THYMNOE01) GBI.g4034471.edit.1 1 638 ( ) 7741938J1798 1533 (THYMNOE01) 3400685H1 2579 2813 (UTRSNOT16) g5836340 289 73871164543V1 1693 2371 3992505T6 2038 2650 (LUNGNON03) 71761861V1 955 17053042523F6 2585 3172 (HEAANOT01) 42 7473300CB1 1997 1-467,FL7473300CB1_(—) 1 1997 523-1997 00002

[0366] TABLE 5 Polynucleotide SEQ ID NO: Incyte Project ID:Representative Library 22 5155802CB1 BONRFEC01 23 71269782CB1 OVARTUT0326 2759385CB1 TESTTUT02 27 4226182CB1 BRSTTUT02 28 5078962CB1 BRABDIK0229 7474340CB1 TONSDIT01 30 7477287CB1 PROSNOT06 31 2994162CB1 HEAADIR0132 3965293CB1 PROSNOT14 33 4948403CB1 PROSTMC01 34 7473165CB1 BRAENOT0235 7476667CB1 EOSIHET02 37 3671788CB1 PGANNOT01 38 7479181CB1 PLACNOT0239 6621372CB1 THYRDIE01 40 4847254CB1 SPLNTUT02 41 5776350CB1 LUNGNON03

[0367] TABLE 6 Library Vector Library Description BONRFEC01 pINCY Thislarge size-fractionated library was constructed using RNA isolated fromrib bone tissue removed from a Caucasian male fetus who died fromPatau's syn- drome (trisomy 13) at 20-weeks' gestation. Serologies werenegative. BRABDIK02 PSPORT1 This amplified and normalized library wasconstructed using pooled cDNA from three different donors. cDNA was gen-erated using mRNA isolated from diseased vermis tissue removed from a79-year-old Caucasian female (donor A) who died from pneumonia, an83-year-old Caucasian male (donor B) who died from congestive heartfailure, and an 87- year-old Caucasian female (donor C) who died fromesophageal cancer. Pathology indicated severe Alzheimer's disease indonors A & B and moderate Alzheimer's disease in donor C. Patienthistory in- cluded glaucoma, pseudophakia, gastritis withgastrointestinal bleeding, peripheral vascular disease, chronicobstructive pulmonary disease, seizures, tobacco abuse in remission, andtransitory ischemic attacks in donor A; Parkinson's disease and athero-sclerosis in donor B; hyper- tension, coronary artery disease, cerebralvascular accident, and hypothyroidism in donor C. Family historyincluded Alzheimer's disease in the mother and sibling(s) of donor A.Independent clones from this amplified library were normalized in oneround using conditions adapted Soares et al., PNAS (1994) 91:9228-9232and Bonaldo et al., Genome Research 6 (1996):79 BRAENOT02 pINCY Librarywas constructed using RNA isolated from posterior parietal cortex tissueremoved from the brain of a 35-year-old Caucasian male who died fromcardiac failure. BRSTTUT02 PSPORT1 Library was constructed using RNAisolated from breast tumor tissue removed from a 54-year-old Caucasianfemale during a bilateral radical mastectomy with recon- struction.Pathology indicated residual invasive grade 3 mammary ductal adeno-carcinoma. The remaining breast parenchyma exhibited proliferativefibrocystic changes without atypia. One of 10 axillary lymph nodes hadmetastatic tumor as a microscopic intranodal focus. Patient historyincluded kidney infection and condyloma acuminatum. Family historyincluded benign hypertension, hyperlipidemia, and a malignant colonneoplasm. EOSIHET02 PBLUESCRIPT Library was constructed using RNAisolated from peripheral blood cells apheresed from a 48-year-oldCaucasian male. Patient history included hypereosinophilia. The cellpop- ulation was determined to be greater than 77% eosinophils byWright's staining. HEAADIR01 pINCY The library was constructed using RNAisolated from diseased right atrium and heart muscle wall tissue removedfrom a 7-month-old Caucasian male who died from cardiopulmonary arrestdue to Pompe's disease. Patient history included Pompe's disease, leftventricular hypertrophy, pyrexia, right completec left lip, cleftpalate, chronic serous otitis media, hypertrophic cardiomyopathy,congestive heart failure, and developmental delays. Family historyincluded acute myocardial infarction, diabetes, cystic fibrosis, andDown's syndrome. LUNGNON03 PSPORT1 This normalized library wasconstructed from 2.56 million independent clones from a lung tissuelibrary. RNA was made from lung tissue removed from the left lobe a58-year-old Caucasian male during a segmental lung resection. Pathologyfor the associated tumor tissue indicated a metastatic grade 3 (of 4)osteosarcoma. Patient history included soft tissue cancer, secondarycancer of the lung, prostate cancer, and an acute duodenal ulcer withhemorrhage. Patient also received radi- ation therapy to theretroperitoneum. Family history included prostate cancer, breast cancer,and acute leukemia. The normalization and hybridization conditions wereadapted from Soares et al., PNAS (1994) 91:9228; Swaroop et al., NAR(1991) 19:1954; and Bonaldo et al., Genome Research (1996) 6:791.OVARTUT03 pINCY Library was constructed using RNA isolated from ovariantumor tissue removed from the left ovary of a 52-year-old mixedethnicity female during a total abdominal hysterectomy, bilateralsalpingo-oophorectomy, peritoneal and lymphatic structure biopsy,regional lymph node excision, and peritoneal tissue destruction.Pathology indicated an invasive grade 3 (of 4) seroanaplastic carcinomaforming a mass in the left ovary. Multiple tumor implants were presenton the surface of the left ovary and fallopian tube, right ovary andfallopian tube, posterior surface of the uterus, and cul-de-sac. Theendometrium was atrophic. Multiple (2) leiomyomata were identified, onesubserosal and 1 intramural. Pathology also indicated a metastatic grade3 seroanaplastic carcinoma involving the omentum, cul-de-sac peritoneum,left broad ligament peri- toneum, and mesentery colon. Patient historyincluded breast cancer, chronic peptic ulcer, and joint pain. Familyhistory included colon cancer, cerebrovascular disease, breast cancer,type II diabetes, esophagus cancer, and depressive disorder. PGANNOT01PSPORT1 Library was constructed using RNA isolated from paraganglionictumor tissue removed from the intra-abdominal region of a 46-year-oldCaucasian male during exploratory laparotomy. Pathology indicated abenign paraganglioma and was asso- ciated with a grade 2 renal cellcarcinoma, clear cell type, which did not penetrate the capsule.Surgical margins were negative for tumor. PLACNOT02 pINCY Library wasconstructed using RNA isolated from the placental tissue of a Hispanicfemale fetus, who was prematurely delivered at 21 weeks' gestation.Serologies of the mother's blood were positive for CMV(cytomegalovirus). PROSNOT14 pINCY Library was constructed using RNAisolated from diseased prostate tissue removed from a 60-year-oldCaucasian male during radical prostatectomy and regional lymph nodeexcision. Pathology indicated adenofibromatous hyperplasia. Pathologyfor the associated tumor tissue indicated an adeno- carcinoma (Gleasongrade 3 + 4). The patient presented with elevated prostate specificantigen (PSA). Patient history included a kidney cyst and hematuria.Family history included benign hyper- tension, cerebrovascular disease,and arterio- sclerotic coronary artery disease. PROSNOT06 PSPORT Librarywas constructed using RNA isolated from the diseased prostate tissue ofa 57-year-old Caucasian male during radical prostatectomy, removal ofboth testes and excision of regional lymph nodes. Pathology indicatedadenofibromatous hyperplasia. Pathology for the matched tumor tissueindicated adenocarcinoma (Gleason grade 3 + 3) in both the left andright periphery of the prostate. There was perineural invasion, and thetumor perforated the capsule. A single right pelvic lymph node and theright and left apical surgical margins were positive for tumor. Patienthistory included a benign neoplasm of the large bowel and type Idiabetes. Patient medications included insulin. Family history includeda malignant neoplasm of the prostate in the father and type I diabetesin the mother. PROSTMC01 pINCY This size-selected library wasconstructed using RNA isolated from diseased prostate tissue removedfrom a 55-year-old Caucasian male during a radical prostatectomy,regional lymph node excision, and prostate needle biopsy. Pathologyindicated adeno- fibromatous hyperplasia. Pathology for the matchedtumor tissue indicated adenocarcinoma, Gleason grade 5 + 4, forming apredominant mass involving the left side peripherally with extensioninto the right posterior superior region. The tumor invaded andperforated the capsule to involve periprostatic tissue in the leftposterior superior region. The left inferior and superior posteriorsurgical margins were positive. The right and left seminal vesicles,bladder neck tissue (after re-excision), and multiple pelvic lymph nodeswere negative for tumor. One (of 9) left pelvic lymph nodes wasmetastatically involved. The patient presented with elevated prostatespecific antigen (PSA). Patient history included calculus of the kidney.Previous surgeries included an adenotonsillectomy. Patient medicationsincluded Khats claw, an herbal pre- paration. Family history includedbreast cancer in the mother; lung cancer in the father; and breastcancer in the si SPLNTUT02 pINCY Library was constructed using RNAisolated from spleen tumor tissue obtained from a 45-year-old maleduring a staging laparotomy. Pathology indicated nodular sclerosing typeof Hodgkin's disease forming innumerable nodules. Multiple lymph nodeswere positive for Hodgkin's disease. TESTTUT02 pINCY Library wasconstructed using RNA isolated from testicular tumor removed from a31-year-old Caucasian male during unilateral orchiectomy. Pathologyindicated embryonal carcinoma. THYRDIE01 PCDNA2.1 This 5′ biased randomprimed library was con- structed using RNA isolated from diseasedthyroid tissue removed from a 22-year-old Caucasian female during closedthyroid biopsy, partial thyroidectomy, and regional lymph node excision.Pathology indicated adenomatous hyperplasia. The patient presented withmalignant neoplasm of the thyroid. Patient history included normaldelivery, alcohol abuse, and tobacco abuse. Previous surgeries includedmyringotomy. Patient medications included an unspecified type of birthcontrol pills. Family history included hyperlipidemia and depressivedisorder in the mother; and benign hypertension, congestive heartfailure, and chronic leukemia in the grandparent(s). TONSDIT01 pINCYLibrary was constructed using RNA isolated from the tonsil tissue of a6-year-old Caucasian male during adenotonsillectomy. Pathology indicatedlymphoid hyperplasia of the tonsils. The patient presented with anabscess of the pharynx. The patient was not taking any medications.Family history included hypothyroidism in the grand- parent(s) andbenign skin neoplasm in the sibling(s).

[0368] TABLE 7 Program Description Reference Parameter Threshold ABI Aprogram that removes vector Applied Biosystems, FACTURA sequences andmasks Foster City, CA. ambiguous bases in nucleic acid sequences. ABI/ AFast Data Finder Applied Biosystems, Mismatch <50% PARACEL FDF useful incomparing and Foster City, CA; annotating amino acid Paracel Inc., ornucleic acid sequences. Pasadena, CA. ABI A program that assemblesApplied Biosystems, AutoAssembler nucleic acid sequences. Foster City,CA. BLAST A Basic Local Alignment Altschul, S. F. et al. ESTs:Probability Search Tool useful in (1990) J. Mol. Biol. value = 1.0E−8sequence similarity search 215:403-410; or less; Full for amino acid andAltschul, S. F. et al. Length sequences: nucleic acid sequences. (1997)Nucleic Acids Probability BLAST includes five Res. 25:3389-3402. value =1.0E−10 functions: blastp, blastn, or less blastx, tblastn, and tblastx.FASTA A Pearson and Lipman Pearson, W. R. and ESTs: fasta E algorithmthat searches for D. J. Lipman (1988) value = 1.06E−6; similaritybetween a Proc. Natl. Acad Sci. Assembled ESTs: query sequence and agroup USA 85:2444-2448; fasta Identity = of sequences of the samePearson, W. R. (1990) 95% or greater type. FASTA comprises as MethodsEnzymol. and Match least five functions: 183:63-98; and length = 200fasta, tfasta, fastx, Smith, T. F. and bases or greater; fastx tfastx,and ssearch. M. S. Waterman (1981) E value = 1.0E−8 Adv. Appl. Math. orless; Full Length 2:482-489. sequences: fastx score = 100 or greaterBLIMPS A BLocks IMProved Searcher Henikoff, S. and Probability thatmatches a sequence J. G. Henikoff (1991) value = 1.0E−3 against those inBLOCKS, Nucleic Acids Res. or less PRINTS, DOMO, PRODOM, and19:6565-6572; PFAM databases to search Henikoff, J. G. and for genefamilies, sequence S. Henikoff (1996) homology, and structural MethodsEnzymol. fingerprint regions. 266:88-105; and Attwood, T. K. et al.(1997) J. Chem. Inf. Comput. Sci. 37:417- HMMER An algorithm forsearching Krogh, A. et al. PFAM, INCY, SMART or a query sequence against(1994) J. Mol. Biol. TIGRFAM hits: hidden Markov model 235:1501-1531;Probability (HMM)-based databases of Sonnhammer, E. L. L. value = 1.0E−3protein family consensus et al. (1988) or less; Signal sequences, suchas PFAM, Nucleic Acids Res. peptide hits: INCY, SMART and TIGRFAM.26:320-322; Score = 0 or greater Durbin, R. et al. (1998) Our WorldView, in a Nutshell, Cambridge Univ. Press, pp. 1- ProfileScan Analgorithm that Gribskov, M. et al. Normalized quality searches forstructural (1988) CABIOS 4:61-66; score = GCG and sequence motifs inGribskov, M. et al. specified ‘HIGH’ protein sequences that (1989)Methods Enzymol. value for that match sequence patterns 183:146-159;particular Prosite defined in Prosite. Bairoch, A. et al. motif.Generally, (1997) Nucleic Acids score = 1.4-2.1. Res. 25:217-221. PhredA base-calling algorithm Ewing, B. et al. that examines automated (1998)Genome Res. sequencer traces with 8:175-185; Ewing, high sensitivity andB. and P. Green (1998) probability. Genome Res. 8:186-194. Phrap A PhilsRevised Assembly Smith, T. F. and M. S. Score = 120 or Program includingSWAT Waterman (1981) greater; Match and CrossMatch, programs Adv. Appl.Math. length = 56 based on efficient 2:482-489; or greaterimplementation of the Smith, T. F. and M. S. Smith-Waterman algorithm,Waterman (1981) J. Mol. useful in searching Biol. 147:195-197; sequencehomology and and Green, P., Univer- assembling DNA sequences. sity ofWashington, Seattle, WA. Consed A graphical tool for Gordon, D. et al.(1998) viewing and editing Phrap Genome Res. 8:195-202. assemblies.SPScan A weight matrix analysis Nielson, H. et al. (1997) Score = 3.5program that scans Protein Engineering or greater protein sequences for10:1-6; Claverie, the presence of secretory J. M. and S. Audic (1997)signal peptides. CABIOS 12:431-439. TMAP A program that uses weightPersson, B. and P. matrices to delineate Argos (1994) J. Mol.transmembrane segments on Biol. 237:182-192; protein sequences andPersson, B. and P. determine orientation. Argos (1996) Protein Sci.5:363-371. TMHMMER A program that uses a Sonnhammer, E. L. et al. hiddenMarkov model (HMM) (1998) Proc. Sixth Intl. to delineate transmembraneConf. On Intelligent segments on protein Systems for Mol. Biol.,sequences and determine Glasgow et al., eds., orientation. The Am.Assoc. for Artificial Intelligence (AAAI) Press, Menlo Park, CA, and MITPress, Cambridge, MA, pp. 175-182. Motifs A program that searchesBairoch, A. et al. (1997) amino acid sequences for Nucleic Acids Res.patterns that matched 25:217-221; Wisconsin those defined in Prosite.Package Program Manual, version 9, page M51-59, Genetics Computer Group,Madison, WI.

[0369]

1 42 1 767 PRT Homo sapiens misc_feature Incyte ID No 5155802CD1 1 MetPro Thr Val Ile Ser Ala Ser Val Ala Pro Arg Thr Ala Ala 1 5 10 15 GluPro Arg Ser Pro Gly Pro Val Pro His Pro Ala Gln Ser Lys 20 25 30 Ala ThrGlu Ala Gly Gly Gly Asn Pro Ser Gly Ile Tyr Ser Ala 35 40 45 Ile Ile SerArg Asn Phe Pro Ile Ile Gly Val Lys Glu Lys Thr 50 55 60 Phe Glu Gln LeuHis Lys Lys Cys Leu Glu Lys Lys Val Leu Tyr 65 70 75 Val Asp Pro Glu PhePro Pro Asp Glu Thr Ser Leu Phe Tyr Ser 80 85 90 Gln Lys Phe Pro Ile GlnPhe Val Trp Lys Arg Pro Pro Glu Ile 95 100 105 Cys Glu Asn Pro Arg PheIle Ile Asp Gly Ala Asn Arg Thr Asp 110 115 120 Ile Cys Gln Gly Glu LeuGly Asp Cys Trp Phe Leu Ala Ala Ile 125 130 135 Ala Cys Leu Thr Leu AsnGln His Leu Leu Phe Arg Val Ile Pro 140 145 150 His Asp Gln Ser Phe IleGlu Asn Tyr Ala Gly Ile Phe His Phe 155 160 165 Gln Phe Trp Arg Tyr GlyGlu Trp Val Asp Val Val Ile Asp Asp 170 175 180 Cys Leu Pro Thr Tyr AsnAsn Gln Leu Val Phe Thr Lys Ser Asn 185 190 195 His Arg Asn Glu Phe TrpSer Ala Leu Leu Glu Lys Ala Tyr Ala 200 205 210 Lys Leu His Gly Ser TyrGlu Ala Leu Lys Gly Gly Asn Thr Thr 215 220 225 Glu Ala Met Glu Asp PheThr Gly Gly Val Thr Glu Phe Phe Glu 230 235 240 Ile Arg Asp Ala Pro SerAsp Met Tyr Lys Ile Met Lys Lys Ala 245 250 255 Ile Glu Arg Gly Ser LeuMet Gly Cys Ser Ile Asp Thr Ile Ile 260 265 270 Pro Val Gln Tyr Glu ThrArg Met Ala Cys Gly Leu Val Arg Gly 275 280 285 His Ala Tyr Ser Val ThrGly Leu Asp Glu Val Pro Phe Lys Gly 290 295 300 Glu Lys Val Lys Leu ValArg Leu Arg Asn Pro Trp Gly Gln Val 305 310 315 Glu Trp Asn Gly Ser TrpSer Asp Arg Trp Lys Asp Trp Ser Phe 320 325 330 Val Asp Lys Asp Glu LysAla Arg Leu Gln His Gln Val Thr Glu 335 340 345 Asp Gly Glu Phe Trp MetSer Tyr Glu Asp Phe Ile Tyr His Phe 350 355 360 Thr Lys Leu Glu Ile CysAsn Leu Thr Ala Asp Ala Leu Gln Ser 365 370 375 Asp Lys Leu Gln Thr TrpThr Val Ser Val Asn Glu Gly Arg Trp 380 385 390 Val Arg Gly Cys Ser AlaGly Gly Cys Arg Asn Phe Pro Asp Thr 395 400 405 Phe Trp Thr Asn Pro GlnTyr Arg Leu Lys Leu Leu Glu Glu Asp 410 415 420 Asp Asp Pro Asp Asp SerGlu Val Ile Cys Ser Phe Leu Val Ala 425 430 435 Leu Met Gln Lys Asn ArgArg Lys Asp Arg Lys Leu Gly Ala Ser 440 445 450 Leu Phe Thr Ile Gly PheAla Ile Tyr Glu Val Pro Lys Glu Met 455 460 465 His Gly Asn Lys Gln HisLeu Gln Lys Asp Phe Phe Leu Tyr Asn 470 475 480 Ala Ser Lys Ala Arg SerLys Thr Tyr Ile Asn Met Arg Glu Val 485 490 495 Ser Gln Arg Phe Arg LeuPro Pro Ser Glu Tyr Val Ile Val Pro 500 505 510 Ser Thr Tyr Glu Pro HisGln Glu Gly Glu Phe Ile Leu Arg Val 515 520 525 Phe Ser Glu Lys Arg AsnLeu Ser Glu Glu Val Glu Asn Thr Ile 530 535 540 Ser Val Asp Arg Pro ValPro Ile Ile Phe Val Ser Asp Arg Ala 545 550 555 Asn Ser Asn Lys Glu LeuGly Val Asp Gln Glu Ser Glu Glu Gly 560 565 570 Lys Gly Lys Thr Ser ProAsp Lys Gln Lys Gln Ser Pro Gln Pro 575 580 585 Gln Pro Gly Ser Ser AspGln Glu Ser Glu Glu Gln Gln Gln Phe 590 595 600 Arg Asn Ile Phe Lys GlnIle Ala Gly Asp Asp Met Glu Ile Cys 605 610 615 Ala Asp Glu Leu Lys LysVal Leu Asn Thr Val Val Asn Lys His 620 625 630 Lys Asp Leu Lys Thr HisGly Phe Thr Leu Glu Ser Cys Arg Ser 635 640 645 Met Ile Ala Leu Met AspThr Asp Gly Ser Gly Lys Leu Asn Leu 650 655 660 Gln Glu Phe His His LeuTrp Asn Lys Ile Lys Ala Trp Gln Lys 665 670 675 Ile Phe Lys His Tyr AspThr Asp Gln Ser Gly Thr Ile Asn Ser 680 685 690 Tyr Glu Met Arg Asn AlaVal Asn Asp Ala Gly Phe His Leu Asn 695 700 705 Asn Gln Leu Tyr Asp IleIle Thr Met Arg Tyr Ala Asp Lys His 710 715 720 Met Asn Ile Asp Phe AspSer Phe Ile Cys Cys Phe Val Arg Leu 725 730 735 Glu Gly Met Phe Arg AlaPhe His Ala Phe Asp Lys Asp Gly Asp 740 745 750 Gly Ile Ile Lys Leu AsnVal Leu Glu Trp Leu Gln Leu Thr Met 755 760 765 Tyr Ala 2 574 PRT Homosapiens misc_feature Incyte ID No 71269782CD1 2 Met Gly Glu Asn Glu AlaSer Leu Pro Asn Thr Ser Leu Gln Gly 1 5 10 15 Lys Lys Met Ala Tyr GlnLys Val His Ala Asp Gln Arg Ala Pro 20 25 30 Gly His Ser Gln Tyr Leu AspAsn Asp Asp Leu Gln Ala Thr Ala 35 40 45 Leu Asp Leu Glu Trp Asp Met GluLys Glu Leu Glu Glu Ser Gly 50 55 60 Phe Asp Gln Phe Gln Leu Asp Gly AlaGlu Asn Gln Asn Leu Gly 65 70 75 His Ser Glu Thr Ile Asp Leu Asn Leu AspSer Ile Gln Pro Ala 80 85 90 Thr Ser Pro Lys Gly Arg Phe Gln Arg Leu GlnGlu Glu Ser Asp 95 100 105 Tyr Ile Thr His Tyr Thr Arg Ser Ala Pro LysSer Asn Arg Cys 110 115 120 Asn Phe Cys His Val Leu Lys Ile Leu Cys ThrAla Thr Ile Leu 125 130 135 Phe Ile Phe Gly Ile Leu Ile Gly Tyr Tyr ValHis Thr Asn Cys 140 145 150 Pro Ser Asp Ala Pro Ser Ser Gly Thr Val AspPro Gln Leu Tyr 155 160 165 Gln Glu Ile Leu Lys Thr Ile Gln Ala Glu AspIle Lys Lys Ser 170 175 180 Phe Arg Asn Leu Val Gln Leu Tyr Lys Asn GluAsp Asp Met Glu 185 190 195 Ile Ser Lys Lys Ile Lys Thr Gln Trp Thr SerLeu Gly Leu Glu 200 205 210 Asp Val Gln Phe Val Asn Tyr Ser Val Leu LeuAsp Leu Pro Gly 215 220 225 Pro Ser Pro Ser Thr Val Thr Leu Ser Ser SerGly Gln Cys Phe 230 235 240 His Pro Asn Gly Gln Pro Cys Ser Glu Glu AlaArg Lys Asp Ser 245 250 255 Ser Gln Asp Leu Leu Tyr Ser Tyr Ala Ala TyrSer Ala Lys Gly 260 265 270 Thr Leu Lys Ala Glu Val Ile Asp Val Ser TyrGly Met Ala Asp 275 280 285 Asp Leu Lys Arg Ile Arg Lys Ile Lys Asn ValThr Asn Gln Ile 290 295 300 Ala Leu Leu Lys Leu Gly Lys Leu Pro Leu LeuTyr Lys Leu Ser 305 310 315 Ser Leu Glu Lys Ala Gly Phe Gly Gly Val LeuLeu Tyr Ile Asp 320 325 330 Pro Cys Asp Leu Pro Lys Thr Val Asn Pro SerHis Asp Thr Phe 335 340 345 Met Val Ser Leu Asn Pro Gly Gly Asp Pro SerThr Pro Gly Tyr 350 355 360 Pro Ser Val Asp Glu Ser Phe Arg Gln Ser ArgSer Asn Leu Thr 365 370 375 Ser Leu Leu Val Gln Pro Ile Ser Ala Ser LeuVal Ala Lys Leu 380 385 390 Ile Ser Ser Pro Lys Ala Arg Thr Lys Asn GluAla Cys Ser Ser 395 400 405 Leu Glu Leu Pro Asn Asn Glu Ile Arg Val ValSer Met Gln Val 410 415 420 Gln Thr Val Thr Lys Leu Lys Thr Val Thr AsnVal Val Gly Phe 425 430 435 Val Met Gly Leu Thr Ser Pro Asp Arg Tyr IleIle Val Gly Ser 440 445 450 His His His Thr Ala His Ser Tyr Asn Gly GlnGlu Trp Ala Ser 455 460 465 Ser Thr Ala Ile Ile Thr Ala Phe Ile Arg AlaLeu Met Ser Lys 470 475 480 Val Lys Arg Gly Trp Arg Pro Asp Arg Thr IleVal Phe Cys Ser 485 490 495 Trp Gly Gly Thr Ala Phe Gly Asn Ile Gly SerTyr Glu Trp Gly 500 505 510 Glu Asp Phe Lys Lys Val Leu Gln Lys Asn ValVal Ala Tyr Ile 515 520 525 Ser Leu His Ser Pro Ile Arg Gly Asn Ser SerLeu Tyr Pro Val 530 535 540 Ala Ser Pro Ser Leu Gln Gln Leu Val Val GluVal Arg Gln Thr 545 550 555 Thr Ile Val Ser Asn Asp Tyr Ala Lys Pro ThrPhe Ser Leu Tyr 560 565 570 Phe Asp Ile Ser 3 320 PRT Homo sapiensmisc_feature Incyte ID No 7472651CD1 3 Met Gly Asp Pro Glu Gly Ser AlaGlu Trp Gly Trp Gly Lys Gly 1 5 10 15 Ile Pro Val Val Arg Arg Asn LeuLeu Thr Val Asp Gly Ile Ser 20 25 30 Leu Cys Leu Glu Gly Ser Trp Trp ArgGln Lys Gly Pro Ala Ser 35 40 45 Pro Gly Phe Ser His Ser Leu Pro Arg LeuGln Pro Asn Pro Gly 50 55 60 Pro Ser Ser Thr Met Trp Leu Leu Leu Thr LeuSer Phe Leu Leu 65 70 75 Ala Ser Thr Ala Ala Gln Asp Gly Asp Lys Leu LeuGlu Gly Asp 80 85 90 Glu Cys Ala Pro His Ser Gln Pro Trp Gln Val Ala LeuTyr Glu 95 100 105 Arg Gly Arg Phe Asn Cys Gly Ala Ser Leu Ile Ser ProHis Trp 110 115 120 Val Leu Ser Ala Ala His Cys Gln Ser Arg Phe Met ArgVal Arg 125 130 135 Leu Gly Glu His Asn Leu Arg Lys Arg Asp Gly Pro GluGln Leu 140 145 150 Arg Thr Thr Ser Arg Val Ile Pro His Pro Arg Tyr GluAla Arg 155 160 165 Ser His Arg Asn Asp Ile Met Leu Leu Arg Leu Val GlnPro Ala 170 175 180 Arg Leu Asn Pro Gln Val Arg Pro Ala Val Leu Pro ThrArg Cys 185 190 195 Pro His Pro Gly Glu Ala Cys Val Val Ser Gly Trp GlyLeu Val 200 205 210 Ser His Asn Glu Pro Gly Thr Ala Gly Ser Pro Arg SerGln Val 215 220 225 Ser Leu Pro Asp Thr Leu His Cys Ala Asn Ile Ser IleIle Ser 230 235 240 Asp Thr Ser Cys Asp Lys Ser Tyr Pro Gly Arg Leu ThrAsn Thr 245 250 255 Met Val Cys Ala Gly Ala Glu Gly Arg Gly Ala Glu SerCys Glu 260 265 270 Gly Asp Ser Gly Gly Pro Leu Val Cys Gly Gly Ile LeuGln Gly 275 280 285 Ile Val Ser Trp Gly Asp Val Pro Cys Asp Asn Thr ThrLys Pro 290 295 300 Gly Val Tyr Thr Lys Val Cys His Tyr Leu Glu Trp IleArg Glu 305 310 315 Thr Met Lys Arg Asn 320 4 378 PRT Homo sapiensmisc_feature Incyte ID No 7478251CD1 4 Met Ala Glu Lys Pro Ser Asn GlyVal Leu Val His Met Val Lys 1 5 10 15 Leu Leu Ile Lys Thr Phe Leu AspGly Ile Phe Asp Asp Leu Met 20 25 30 Glu Asn Asn Val Leu Asn Thr Asp GluIle His Leu Ile Gly Lys 35 40 45 Cys Leu Lys Phe Val Val Ser Asn Ala GluAsn Leu Val Asp Asp 50 55 60 Ile Thr Glu Thr Ala Gln Thr Ala Gly Lys IlePhe Arg Glu His 65 70 75 Leu Trp Asn Ser Lys Lys Gln Leu Ser Ser Ile PhePhe Ser Leu 80 85 90 Ser Ala Phe Leu Glu Ile Gln Gly Ala Gln Pro Ser GlyLys Leu 95 100 105 Lys Leu Cys Pro His Ala His Phe His Glu Leu Lys ThrLys Arg 110 115 120 Ala Asp Glu Ile Tyr Pro Val Met Glu Lys Lys Arg ArgThr Cys 125 130 135 Leu Gly Leu Asn Ile Arg Asn Lys Glu Phe Asn Tyr LeuHis Asn 140 145 150 Arg Asn Gly Ser Glu Leu Asp Leu Leu Gly Met Arg AspLeu Leu 155 160 165 Glu Asn Leu Gly Tyr Ser Val Val Ile Lys Glu Asn LeuThr Ala 170 175 180 Gln Glu Met Glu Thr Ala Leu Arg Gln Phe Ala Ala HisPro Glu 185 190 195 His Gln Ser Ser Asp Ser Thr Phe Leu Val Phe Met SerHis Ser 200 205 210 Ile Leu Asn Gly Ile Cys Gly Thr Lys His Trp Asp GlnGlu Pro 215 220 225 Asp Val Leu His Asp Asp Thr Ile Phe Glu Ile Phe AsnAsn Arg 230 235 240 Asn Cys Gln Ser Leu Lys Asp Lys Pro Lys Val Ile IleMet Gln 245 250 255 Ala Cys Arg Gly Asn Gly Ala Gly Ile Val Trp Phe ThrThr Asp 260 265 270 Ser Gly Lys Ala Gly Ala Asp Thr His Gly Arg Leu LeuGln Gly 275 280 285 Asn Ile Cys Asn Asp Ala Val Thr Lys Ala His Val GluLys Asp 290 295 300 Phe Ile Ala Phe Lys Ser Ser Thr Pro His Asn Val SerTrp Arg 305 310 315 His Glu Thr Asn Gly Ser Val Phe Ile Ser Gln Ile IleTyr Tyr 320 325 330 Phe Arg Glu Tyr Ser Trp Ser His His Leu Glu Glu IlePhe Gln 335 340 345 Lys Val Gln His Ser Phe Glu Thr Pro Asn Ile Leu ThrGln Leu 350 355 360 Pro Thr Ile Glu Arg Leu Ser Met Thr Arg Tyr Phe TyrLeu Phe 365 370 375 Pro Gly Asn 5 366 PRT Homo sapiens misc_featureIncyte ID No 2759385CD1 5 Met Thr Val Arg Asn Ile Ala Ser Ile Cys AsnMet Gly Thr Asn 1 5 10 15 Ala Ser Ala Leu Glu Lys Asp Ile Gly Pro GluGln Phe Pro Ile 20 25 30 Asn Glu His Tyr Phe Gly Leu Val Asn Phe Gly AsnThr Cys Tyr 35 40 45 Cys Asn Ser Val Leu Gln Ala Leu Tyr Phe Cys Arg ProPhe Arg 50 55 60 Glu Asn Val Leu Ala Tyr Lys Ala Gln Gln Lys Lys Lys GluAsn 65 70 75 Leu Leu Thr Cys Leu Ala Asp Leu Phe His Ser Ile Ala Thr Gln80 85 90 Lys Lys Lys Val Gly Val Ile Pro Pro Lys Lys Phe Ile Ser Arg 95100 105 Leu Arg Lys Glu Asn Asp Leu Phe Asp Asn Tyr Met Gln Gln Asp 110115 120 Ala His Glu Phe Leu Asn Tyr Leu Leu Asn Thr Ile Ala Asp Ile 125130 135 Leu Gln Glu Glu Lys Lys Gln Glu Lys Gln Asn Gly Lys Leu Lys 140145 150 Asn Gly Asn Met Asn Glu Pro Ala Glu Asn Asn Lys Pro Glu Leu 155160 165 Thr Trp Val His Glu Ile Phe Gln Gly Thr Leu Thr Asn Glu Thr 170175 180 Arg Cys Leu Asn Cys Glu Thr Val Ser Ser Lys Asp Glu Asp Phe 185190 195 Leu Asp Leu Ser Val Asp Val Glu Gln Asn Thr Ser Ile Thr His 200205 210 Cys Leu Arg Asp Phe Ser Asn Thr Glu Thr Leu Cys Ser Glu Gln 215220 225 Lys Tyr Tyr Cys Glu Thr Cys Cys Ser Lys Gln Glu Ala Gln Lys 230235 240 Arg Met Arg Val Lys Lys Leu Pro Met Ile Leu Ala Leu His Leu 245250 255 Lys Arg Phe Lys Tyr Met Glu Gln Leu His Arg Tyr Thr Lys Leu 260265 270 Ser Tyr Arg Val Val Phe Pro Leu Glu Leu Arg Leu Phe Asn Thr 275280 285 Ser Ser Asp Ala Val Asn Leu Asp Arg Met Tyr Asp Leu Val Ala 290295 300 Val Val Val His Cys Gly Ser Gly Pro Asn Arg Gly His Tyr Ile 305310 315 Thr Ile Val Lys Ser His Gly Phe Trp Leu Leu Phe Asp Asp Asp 320325 330 Ile Val Glu Lys Ile Asp Ala Gln Ala Ile Glu Glu Phe Tyr Gly 335340 345 Leu Thr Ser Asp Ile Ser Lys Asn Ser Glu Ser Gly Tyr Ile Leu 350355 360 Phe Tyr Gln Ser Arg Glu 365 6 389 PRT Homo sapiens misc_featureIncyte ID No 4226182CD1 6 Met Asp Tyr Pro Arg Tyr Leu Gly Ala Val PhePro Gly Thr Met 1 5 10 15 Cys Ile Thr Arg Tyr Ser Ala Gly Val Ala LeuGln Cys Gly Pro 20 25 30 Ala Ser Cys Cys Asp Phe Arg Thr Cys Val Leu LysAsp Gly Ala 35 40 45 Lys Cys Tyr Lys Gly Leu Cys Cys Lys Asp Cys Gln IleLeu Gln 50 55 60 Ser Gly Val Glu Cys Arg Pro Lys Ala His Pro Glu Cys AspIle 65 70 75 Ala Glu Asn Cys Asn Gly Ser Ser Pro Glu Cys Gly Pro Asp Ile80 85 90 Thr Leu Ile Asn Gly Leu Ser Cys Lys Asn Asn Lys Phe Ile Cys 95100 105 Tyr Asp Gly Asp Cys His Asp Leu Asp Ala Arg Cys Glu Ser Val 110115 120 Phe Gly Lys Gly Ser Arg Asn Ala Pro Phe Ala Cys Tyr Glu Glu 125130 135 Ile Gln Ser Gln Ser Asp Arg Phe Gly Asn Cys Gly Arg Asp Arg 140145 150 Asn Asn Lys Tyr Val Phe Cys Gly Trp Arg Asn Leu Ile Cys Gly 155160 165 Arg Leu Val Cys Thr Tyr Pro Thr Arg Lys Pro Phe His Gln Glu 170175 180 Asn Gly Asp Val Ile Tyr Ala Phe Val Arg Asp Ser Val Cys Ile 185190 195 Thr Val Asp Tyr Lys Leu Pro Arg Thr Val Pro Asp Pro Leu Ala 200205 210 Val Lys Asn Gly Ser Gln Cys Asp Ile Gly Arg Val Cys Val Asn 215220 225 Arg Glu Cys Val Glu Ser Arg Ile Ile Lys Ala Ser Ala His Val 230235 240 Cys Ser Gln Gln Cys Ser Gly His Gly Val Cys Asp Ser Arg Asn 245250 255 Lys Cys His Cys Ser Pro Gly Tyr Lys Pro Pro Asn Cys Gln Ile 260265 270 Arg Ser Lys Gly Phe Ser Ile Phe Pro Glu Glu Asp Met Gly Ser 275280 285 Ile Met Glu Arg Ala Ser Gly Lys Thr Glu Asn Thr Trp Leu Leu 290295 300 Gly Phe Leu Ile Ala Leu Pro Ile Leu Ile Val Thr Thr Ala Ile 305310 315 Val Leu Ala Arg Lys Gln Leu Lys Lys Trp Phe Ala Lys Glu Glu 320325 330 Glu Phe Pro Ser Ser Glu Ser Lys Ser Glu Gly Ser Thr Gln Thr 335340 345 Tyr Ala Ser Gln Ser Ser Ser Glu Gly Ser Thr Gln Thr Tyr Ala 350355 360 Ser Gln Thr Arg Ser Glu Ser Ser Ser Gln Ala Asp Thr Ser Lys 365370 375 Ser Lys Ser Gln Asp Ser Thr Gln Thr Gln Ser Ser Ser Asn 380 3857 217 PRT Homo sapiens misc_feature Incyte ID No 5078962CD1 7 Met ThrThr Glu Glu Ile Asp Ala Leu Val His Arg Glu Ile Ile 1 5 10 15 Ser HisAsn Ala Tyr Pro Ser Pro Leu Gly Tyr Gly Gly Phe Pro 20 25 30 Lys Ser ValCys Thr Ser Val Asn Asn Val Leu Cys His Gly Ile 35 40 45 Pro Asp Ser ArgPro Leu Gln Asp Gly Asp Ile Ile Asn Ile Asp 50 55 60 Val Thr Val Tyr TyrAsn Gly Tyr His Gly Asp Thr Ser Glu Thr 65 70 75 Phe Leu Val Gly Asn ValAsp Glu Cys Gly Lys Lys Leu Val Glu 80 85 90 Val Ala Arg Arg Cys Arg AspGlu Ala Ile Ala Ala Cys Arg Ala 95 100 105 Gly Ala Pro Phe Ser Val IleGly Asn Thr Ile Ser His Ile Thr 110 115 120 His Gln Asn Gly Phe Gln ValCys Pro His Phe Val Gly His Gly 125 130 135 Ile Gly Ser Tyr Phe His GlyHis Pro Glu Ile Trp His His Ala 140 145 150 Asn Asp Ser Asp Leu Pro MetGlu Glu Gly Met Ala Phe Thr Ile 155 160 165 Glu Pro Ile Ile Thr Glu GlySer Pro Glu Phe Lys Val Leu Glu 170 175 180 Asp Ala Trp Thr Val Val SerLeu Asp Asn Gln Arg Ser Ala Gln 185 190 195 Phe Glu His Thr Val Leu IleThr Ser Arg Gly Ala Gln Ile Leu 200 205 210 Thr Lys Leu Pro His Glu Ala215 8 486 PRT Homo sapiens misc_feature Incyte ID No 7474340CD1 8 MetGlu Arg Asp Ser His Gly Asn Ala Ser Pro Ala Arg Thr Pro 1 5 10 15 SerAla Gly Ala Ser Pro Ala Gln Ala Ser Pro Ala Gly Thr Pro 20 25 30 Pro GlyArg Ala Ser Pro Ala Gln Ala Ser Pro Ala Gln Ala Ser 35 40 45 Pro Ala GlyThr Pro Pro Gly Arg Ala Ser Pro Ala Gln Ala Ser 50 55 60 Pro Ala Gly ThrPro Pro Gly Arg Ala Ser Pro Gly Arg Ala Ser 65 70 75 Pro Ala Gln Ala SerPro Ala Arg Ala Ser Pro Ala Leu Ala Ser 80 85 90 Leu Ser Arg Ser Ser SerGly Arg Ser Ser Ser Ala Arg Ser Ala 95 100 105 Ser Val Thr Thr Ser ProThr Arg Val Tyr Leu Val Arg Ala Thr 110 115 120 Pro Val Gly Ala Val ProIle Arg Ser Ser Pro Ala Arg Ser Ala 125 130 135 Pro Ala Thr Arg Ala ThrArg Glu Ser Pro Gly Thr Ser Leu Pro 140 145 150 Lys Phe Thr Trp Arg GluGly Gln Lys Gln Leu Pro Leu Ile Gly 155 160 165 Cys Val Leu Leu Leu IleAla Leu Val Val Ser Leu Ile Ile Leu 170 175 180 Phe Gln Phe Trp Gln GlyHis Thr Gly Ile Arg Tyr Lys Glu Gln 185 190 195 Arg Glu Ser Cys Pro LysHis Ala Val Arg Cys Asp Gly Val Val 200 205 210 Asp Cys Lys Leu Lys SerAsp Glu Leu Gly Cys Val Arg Phe Asp 215 220 225 Trp Asp Lys Ser Leu LeuLys Ile Tyr Ser Gly Ser Ser His Gln 230 235 240 Trp Leu Pro Ile Cys SerSer Asn Trp Asn Asp Ser Tyr Ser Glu 245 250 255 Lys Thr Cys Gln Gln LeuGly Phe Glu Ser Ala His Arg Thr Thr 260 265 270 Glu Val Ala His Arg AspPhe Ala Asn Ser Phe Ser Ile Leu Arg 275 280 285 Tyr Asn Ser Thr Ile GlnGlu Ser Leu His Arg Ser Glu Cys Pro 290 295 300 Ser Gln Arg Tyr Ile SerLeu Gln Cys Ser His Cys Gly Leu Arg 305 310 315 Ala Met Thr Gly Arg IleVal Gly Gly Ala Leu Ala Ser Asp Ser 320 325 330 Lys Trp Pro Trp Gln ValSer Leu His Phe Gly Thr Thr His Ile 335 340 345 Cys Gly Gly Thr Leu IleAsp Ala Gln Trp Val Leu Thr Ala Ala 350 355 360 His Cys Phe Phe Val ThrArg Glu Lys Val Leu Glu Gly Trp Lys 365 370 375 Val Tyr Ala Gly Thr SerAsn Leu His Gln Leu Pro Glu Ala Ala 380 385 390 Ser Ile Ala Glu Ile IleIle Asn Ser Asn Tyr Thr Asp Glu Glu 395 400 405 Asp Asp Tyr Asp Ile AlaLeu Met Arg Leu Ser Lys Pro Leu Thr 410 415 420 Leu Ser Gly Glu Gly IleCys Thr Pro Arg Ser Pro Ala Pro Gln 425 430 435 Pro Gln His Pro Leu GlnPro Ser His Leu Ser Ala Ser Val Asn 440 445 450 Ser Tyr Pro Gly Pro LysAla Ser Ala Gly Gln Lys Ser Lys Thr 455 460 465 Leu Lys Asp Pro Tyr MetGlu His Phe Cys Phe Ile Ile Arg Glu 470 475 480 Thr Glu Ala Gln Gly Leu485 9 390 PRT Homo sapiens misc_feature Incyte ID No 7477287CD1 9 MetGly Pro Arg Leu Ile Pro Phe Leu Phe Leu Phe Val Tyr Pro 1 5 10 15 IleLeu Cys Arg Ile Ile Leu Arg Lys Gly Lys Ser Ile Arg Gln 20 25 30 Arg MetGlu Glu Gln Gly Val Leu Glu Thr Phe Leu Arg Asp His 35 40 45 Pro Lys AlaAsp Pro Ile Ala Lys Tyr Tyr Phe Asn Asn Asp Ala 50 55 60 Val Ala Tyr GluPro Phe Thr Asn Tyr Leu Asp Ser Phe Tyr Phe 65 70 75 Gly Glu Ile Ser ThrGly Thr Pro Pro Gln Asn Phe Leu Val Ser 80 85 90 Leu Ile Arg Val Pro ProIle Cys Ser Leu Pro Ser Ile Tyr Cys 95 100 105 Gln Ser Gln Val Cys SerAsn His Asn Arg Phe Asn Pro Ser Leu 110 115 120 Ser Ser Thr Phe Arg AsnAsp Gly Gln Thr Tyr Gly Leu Ser Tyr 125 130 135 Gly Ser Gly Ser Leu SerVal Phe Leu Gly Tyr Asp Thr Val Thr 140 145 150 Val His Asn Ile Val ValAsn Asn Gln Glu Phe Gly Leu Ser Glu 155 160 165 Asn Glu Pro Ser Asp ProPhe Tyr Tyr Ser Asp Phe Asp Gly Ile 170 175 180 Leu Gly Met Ala Tyr ProAsn Met Ala Glu Gly Asn Ser Pro Thr 185 190 195 Val Met Gln Gly Met LeuGln Gln Ser Gln Leu Thr Gln Pro Val 200 205 210 Phe Ser Phe Tyr Phe ThrCys Gln Pro Thr Arg Gln Tyr Cys Gly 215 220 225 Glu Leu Ile Leu Gly GlyVal Asp Pro Asn Leu Tyr Ser Gly Gln 230 235 240 Ile Ile Trp Thr Pro ValSer Pro Glu Leu Tyr Trp Gln Ile Ala 245 250 255 Ile Glu Glu Phe Ala IleGly Asn Gln Ala Thr Gly Leu Cys Ser 260 265 270 Glu Gly Cys Gln Ala IleVal Asp Thr Glu Thr Phe Leu Leu Ala 275 280 285 Val Pro Gln Gln Tyr MetAla Ser Phe Leu Gln Ala Thr Gly Pro 290 295 300 Gln Gln Ala Gln Asn GlyAsp Phe Val Val Asn Cys Ser Tyr Ile 305 310 315 Gln Ser Met Pro Thr IleThr Phe Ile Ile Gly Gly Ala Gln Phe 320 325 330 Pro Leu Pro Pro Ser GluTyr Val Phe Asn Asn Asn Gly Tyr Cys 335 340 345 Arg Leu Gly Thr Glu AlaThr Cys Leu Pro Ser Arg Ser Gly Gln 350 355 360 Pro Leu Trp Ile Leu GlyAsp Val Phe Leu Lys Glu Tyr Cys Ser 365 370 375 Val Tyr Asp Met Ala AsnAsn Arg Val Gly Phe Ala Phe Ser Ala 380 385 390 10 1916 PRT Homo sapiensmisc_feature Incyte ID No 2994162CD1 10 Met Gly Ser Pro Asp Ala Ala AlaAla Val Arg Lys Asp Arg Leu 1 5 10 15 His Pro Arg Gln Val Lys Leu LeuGlu Thr Leu Ser Glu Tyr Glu 20 25 30 Ile Val Ser Pro Ile Arg Val Asn AlaLeu Gly Glu Pro Phe Pro 35 40 45 Thr Asn Val His Phe Lys Arg Thr Arg ArgSer Ile Asn Ser Ala 50 55 60 Thr Asp Pro Trp Pro Ala Phe Ala Ser Ser SerSer Ser Ser Thr 65 70 75 Ser Ser Gln Ala His Tyr Arg Leu Ser Ala Phe GlyGln Gln Phe 80 85 90 Leu Phe Asn Leu Thr Ala Asn Ala Gly Phe Ile Ala ProLeu Phe 95 100 105 Thr Val Thr Leu Leu Gly Thr Pro Gly Val Asn Gln ThrLys Phe 110 115 120 Tyr Ser Glu Glu Glu Ala Glu Leu Lys His Cys Phe TyrLys Gly 125 130 135 Tyr Val Asn Thr Asn Ser Glu His Thr Ala Val Ile SerLeu Cys 140 145 150 Ser Gly Met Leu Gly Thr Phe Arg Ser His Asp Gly AspTyr Phe 155 160 165 Ile Glu Pro Leu Gln Ser Met Asp Glu Gln Glu Asp GluGlu Glu 170 175 180 Gln Asn Lys Pro His Ile Ile Tyr Arg Arg Ser Ala ProGln Arg 185 190 195 Glu Pro Ser Thr Gly Arg His Ala Cys Asp Thr Ser GluHis Lys 200 205 210 Asn Arg His Ser Lys Asp Lys Lys Lys Thr Arg Ala ArgLys Trp 215 220 225 Gly Glu Arg Ile Asn Leu Ala Gly Asp Val Ala Ala LeuAsn Ser 230 235 240 Gly Leu Ala Thr Glu Ala Phe Ser Ala Tyr Gly Asn LysThr Asp 245 250 255 Asn Thr Arg Glu Lys Arg Thr His Arg Arg Thr Lys ArgPhe Leu 260 265 270 Ser Tyr Pro Arg Phe Val Glu Val Leu Val Val Ala AspAsn Arg 275 280 285 Met Val Ser Tyr His Gly Glu Asn Leu Gln His Tyr IleLeu Thr 290 295 300 Leu Met Ser Ile Val Ala Ser Ile Tyr Lys Asp Pro SerIle Gly 305 310 315 Asn Leu Ile Asn Ile Val Ile Val Asn Leu Ile Val IleHis Asn 320 325 330 Glu Gln Asp Gly Pro Ser Ile Ser Phe Asn Ala Gln ThrThr Leu 335 340 345 Lys Asn Phe Cys Gln Trp Gln His Ser Lys Asn Ser ProGly Gly 350 355 360 Ile His His Asp Thr Ala Val Leu Leu Thr Arg Gln AspIle Cys 365 370 375 Arg Ala His Asp Lys Cys Asp Thr Leu Gly Leu Ala GluLeu Gly 380 385 390 Thr Ile Cys Asp Pro Tyr Arg Ser Cys Ser Ile Ser GluAsp Ser 395 400 405 Gly Leu Ser Thr Ala Phe Thr Ile Ala His Glu Leu GlyHis Val 410 415 420 Phe Asn Met Pro His Asp Asp Asn Asn Lys Cys Lys GluGlu Gly 425 430 435 Val Lys Ser Pro Gln His Val Met Ala Pro Thr Leu AsnPhe Tyr 440 445 450 Thr Asn Pro Trp Met Trp Ser Lys Cys Ser Arg Lys TyrIle Thr 455 460 465 Glu Phe Leu Asp Thr Gly Tyr Gly Glu Cys Leu Leu AsnGlu Pro 470 475 480 Glu Ser Arg Pro Tyr Pro Leu Pro Val Gln Leu Pro GlyIle Leu 485 490 495 Tyr Asn Val Asn Lys Gln Cys Glu Leu Ile Phe Gly ProGly Ser 500 505 510 Gln Val Cys Pro Tyr Met Met Gln Cys Arg Arg Leu TrpCys Asn 515 520 525 Asn Val Asn Gly Val His Lys Gly Cys Arg Thr Gln HisThr Pro 530 535 540 Trp Ala Asp Gly Thr Glu Cys Glu Pro Gly Lys His CysLys Tyr 545 550 555 Gly Phe Cys Val Pro Lys Glu Met Asp Val Pro Val ThrAsp Gly 560 565 570 Ser Trp Gly Ser Trp Ser Pro Phe Gly Thr Cys Ser ArgThr Cys 575 580 585 Gly Gly Gly Ile Lys Thr Ala Ile Arg Glu Cys Asn ArgPro Glu 590 595 600 Pro Lys Asn Gly Gly Lys Tyr Cys Val Gly Arg Arg MetLys Phe 605 610 615 Lys Ser Cys Asn Thr Glu Pro Cys Leu Lys Gln Lys ArgAsp Phe 620 625 630 Arg Asp Glu Gln Cys Ala His Phe Asp Gly Lys His PheAsn Ile 635 640 645 Asn Gly Leu Leu Pro Asn Val Arg Trp Val Pro Lys TyrSer Gly 650 655 660 Ile Leu Met Lys Asp Arg Cys Lys Leu Phe Cys Arg ValAla Gly 665 670 675 Asn Thr Ala Tyr Tyr Gln Leu Arg Asp Arg Val Ile AspGly Thr 680 685 690 Pro Cys Gly Gln Asp Thr Asn Asp Ile Cys Val Gln GlyLeu Cys 695 700 705 Arg Gln Ala Gly Cys Asp His Val Leu Asn Ser Lys AlaArg Arg 710 715 720 Asp Lys Cys Gly Val Cys Gly Gly Asp Asn Ser Ser CysLys Thr 725 730 735 Val Ala Gly Thr Phe Asn Thr Val His Tyr Gly Tyr AsnThr Val 740 745 750 Val Arg Ile Pro Ala Gly Ala Thr Asn Ile Asp Val ArgGln His 755 760 765 Ser Phe Ser Gly Glu Thr Asp Asp Asp Asn Tyr Leu AlaLeu Ser 770 775 780 Ser Ser Lys Gly Glu Phe Leu Leu Asn Gly Asn Phe ValVal Thr 785 790 795 Met Ala Lys Arg Glu Ile Arg Ile Gly Asn Ala Val ValGlu Tyr 800 805 810 Ser Gly Ser Glu Thr Ala Val Glu Arg Ile Asn Ser ThrAsp Arg 815 820 825 Ile Glu Gln Glu Leu Leu Leu Gln Val Leu Ser Val GlyLys Leu 830 835 840 Tyr Asn Pro Asp Val Arg Tyr Ser Phe Asn Ile Pro IleGlu Asp 845 850 855 Lys Pro Gln Gln Phe Tyr Trp Asn Ser His Gly Pro TrpGln Ala 860 865 870 Cys Ser Lys Pro Cys Gln Gly Glu Arg Lys Arg Lys LeuVal Cys 875 880 885 Thr Arg Glu Ser Asp Gln Leu Thr Val Ser Asp Gln ArgCys Asp 890 895 900 Arg Leu Pro Gln Pro Gly His Ile Thr Glu Pro Cys GlyThr Asp 905 910 915 Cys Asp Leu Arg Trp His Val Ala Ser Arg Ser Glu CysSer Ala 920 925 930 Gln Cys Gly Leu Gly Tyr Arg Thr Leu Asp Ile Tyr CysAla Lys 935 940 945 Tyr Ser Arg Leu Asp Gly Lys Thr Glu Lys Val Asp AspGly Phe 950 955 960 Cys Ser Ser His Pro Lys Pro Ser Asn Arg Glu Lys CysSer Gly 965 970 975 Glu Cys Asn Thr Gly Gly Trp Arg Tyr Ser Ala Trp ThrGlu Cys 980 985 990 Ser Lys Ser Cys Asp Gly Gly Thr Gln Arg Arg Arg AlaIle Cys 995 1000 1005 Val Asn Thr Arg Asn Asp Val Leu Asp Asp Ser LysCys Thr His 1010 1015 1020 Gln Glu Lys Val Thr Ile Gln Arg Cys Ser GluPhe Pro Cys Pro 1025 1030 1035 Gln Trp Lys Ser Gly Asp Trp Ser Glu CysLeu Val Thr Cys Gly 1040 1045 1050 Lys Gly His Lys His Arg Gln Val TrpCys Gln Phe Gly Glu Asp 1055 1060 1065 Arg Leu Asn Asp Arg Met Cys AspPro Glu Thr Lys Pro Thr Ser 1070 1075 1080 Met Gln Thr Cys Gln Gln ProGlu Cys Ala Ser Trp Gln Ala Gly 1085 1090 1095 Pro Trp Gly Gln Cys SerVal Thr Cys Gly Gln Gly Tyr Gln Leu 1100 1105 1110 Arg Ala Val Lys CysIle Ile Gly Thr Tyr Met Ser Val Val Asp 1115 1120 1125 Asp Asn Asp CysAsn Ala Ala Thr Arg Pro Thr Asp Thr Gln Asp 1130 1135 1140 Cys Glu LeuPro Ser Cys His Pro Pro Pro Ala Ala Pro Glu Thr 1145 1150 1155 Arg ArgSer Thr Tyr Ser Ala Pro Arg Thr Gln Trp Arg Phe Gly 1160 1165 1170 SerTrp Thr Pro Cys Ser Ala Thr Cys Gly Lys Gly Thr Arg Met 1175 1180 1185Arg Tyr Val Ser Cys Arg Asp Glu Asn Gly Ser Val Ala Asp Glu 1190 11951200 Ser Ala Cys Ala Thr Leu Pro Arg Pro Val Ala Lys Glu Glu Cys 12051210 1215 Ser Val Thr Pro Cys Gly Gln Trp Lys Ala Leu Asp Trp Ser Ser1220 1225 1230 Cys Ser Val Thr Cys Gly Gln Gly Arg Ala Thr Arg Gln ValMet 1235 1240 1245 Cys Val Asn Tyr Ser Asp His Val Ile Asp Arg Ser GluCys Asp 1250 1255 1260 Gln Asp Tyr Ile Pro Lys Thr Asp Gln Asp Cys SerMet Ser Pro 1265 1270 1275 Cys Pro Gln Arg Thr Pro Asp Ser Gly Leu AlaGln His Pro Phe 1280 1285 1290 Gln Asn Glu Asp Tyr Arg Pro Arg Ser AlaSer Pro Ser Arg Thr 1295 1300 1305 His Val Leu Gly Gly Asn Gln Trp ArgThr Gly Pro Trp Gly Ala 1310 1315 1320 Cys Ser Ser Thr Cys Ala Gly GlySer Gln Arg Arg Val Val Val 1325 1330 1335 Cys Gln Asp Glu Asn Gly TyrThr Ala Asn Asp Cys Val Glu Arg 1340 1345 1350 Ile Lys Pro Asp Glu GlnArg Ala Cys Glu Ser Gly Pro Cys Pro 1355 1360 1365 Gln Trp Ala Tyr GlyAsn Trp Gly Glu Cys Thr Lys Leu Cys Gly 1370 1375 1380 Gly Gly Ile ArgThr Arg Leu Val Val Cys Gln Arg Ser Asn Gly 1385 1390 1395 Glu Arg PhePro Asp Leu Ser Cys Glu Ile Leu Asp Lys Pro Pro 1400 1405 1410 Asp ArgGlu Gln Cys Asn Thr His Ala Cys Pro His Asp Ala Ala 1415 1420 1425 TrpSer Thr Gly Pro Trp Ser Ser Cys Ser Val Ser Cys Gly Arg 1430 1435 1440Gly His Lys Gln Arg Asn Val Tyr Cys Met Ala Lys Asp Gly Ser 1445 14501455 His Leu Glu Ser Asp Tyr Cys Lys His Leu Ala Lys Pro His Gly 14601465 1470 His Arg Lys Cys Arg Gly Gly Arg Cys Pro Lys Trp Lys Ala Gly1475 1480 1485 Ala Trp Ser Gln Cys Ser Val Ser Cys Gly Arg Gly Val GlnGln 1490 1495 1500 Arg His Val Gly Cys Gln Ile Gly Thr His Lys Ile AlaArg Glu 1505 1510 1515 Thr Glu Cys Asn Pro Tyr Thr Arg Pro Glu Ser GluArg Asp Cys 1520 1525 1530 Gln Gly Pro Arg Cys Pro Leu Tyr Thr Trp ArgAla Glu Glu Trp 1535 1540 1545 Gln Glu Cys Thr Lys Thr Cys Gly Glu GlySer Arg Tyr Arg Lys 1550 1555 1560 Val Val Cys Val Asp Asp Asn Lys AsnGlu Val His Gly Ala Arg 1565 1570 1575 Cys Asp Val Ser Lys Arg Pro ValAsp Arg Glu Ser Cys Ser Leu 1580 1585 1590 Gln Pro Cys Glu Tyr Val TrpIle Thr Gly Glu Trp Ser Glu Cys 1595 1600 1605 Ser Val Thr Cys Gly LysGly Tyr Lys Gln Arg Leu Val Ser Cys 1610 1615 1620 Ser Glu Ile Tyr ThrGly Lys Glu Asn Tyr Glu Tyr Ser Tyr Gln 1625 1630 1635 Thr Thr Ile AsnCys Pro Gly Thr Gln Pro Pro Ser Val His Pro 1640 1645 1650 Cys Tyr LeuArg Asp Cys Pro Val Ser Ala Thr Trp Arg Val Gly 1655 1660 1665 Asn TrpGly Ser Cys Ser Val Ser Cys Gly Val Gly Val Met Gln 1670 1675 1680 ArgSer Val Gln Cys Leu Thr Asn Glu Asp Gln Pro Ser His Leu 1685 1690 1695Cys His Thr Asp Leu Lys Pro Glu Glu Arg Lys Thr Cys Arg Asn 1700 17051710 Val Tyr Asn Cys Glu Leu Pro Gln Asn Cys Lys Glu Val Lys Arg 17151720 1725 Leu Lys Gly Ala Ser Glu Asp Gly Glu Tyr Phe Leu Met Ile Arg1730 1735 1740 Gly Lys Leu Leu Lys Ile Phe Cys Ala Gly Met His Ser AspHis 1745 1750 1755 Pro Lys Glu Tyr Val Thr Leu Val His Gly Asp Ser GluAsn Phe 1760 1765 1770 Ser Glu Val Tyr Gly His Arg Leu His Asn Pro ThrGlu Cys Pro 1775 1780 1785 Tyr Asn Gly Ser Arg Arg Asp Asp Cys Gln CysArg Lys Asp Tyr 1790 1795 1800 Thr Ala Ala Gly Phe Ser Ser Phe Gln LysIle Arg Ile Asp Leu 1805 1810 1815 Thr Ser Met Gln Ile Ile Thr Thr AspLeu Gln Phe Ala Arg Thr 1820 1825 1830 Ser Glu Gly His Pro Val Pro PheAla Thr Ala Gly Asp Cys Tyr 1835 1840 1845 Ser Ala Ala Lys Cys Pro GlnGly Arg Phe Ser Ile Asn Leu Tyr 1850 1855 1860 Gly Thr Gly Leu Ser LeuThr Glu Ser Ala Arg Trp Ile Ser Gln 1865 1870 1875 Gly Asn Tyr Ala ValSer Asp Ile Lys Lys Ser Pro Asp Gly Thr 1880 1885 1890 Arg Val Val GlyLys Cys Gly Gly Tyr Cys Gly Lys Cys Thr Pro 1895 1900 1905 Ser Ser GlyThr Gly Leu Glu Val Arg Val Leu 1910 1915 11 314 PRT Homo sapiensmisc_feature Incyte ID No 3965293CD1 11 Met Glu Asp Asp Ser Leu Tyr LeuGly Gly Glu Trp Gln Phe Asn 1 5 10 15 His Phe Ser Lys Leu Thr Ser SerArg Pro Asp Ala Ala Phe Ala 20 25 30 Glu Ile Gln Arg Thr Ser Leu Pro GluLys Ser Pro Leu Ser Cys 35 40 45 Glu Thr Arg Val Asp Leu Cys Asp Asp LeuAla Pro Val Ala Arg 50 55 60 Gln Leu Ala Pro Arg Glu Lys Leu Pro Leu SerSer Arg Arg Pro 65 70 75 Ala Ala Val Gly Ala Gly Leu Gln Asn Met Gly AsnThr Cys Tyr 80 85 90 Val Asn Ala Ser Leu Gln Cys Leu Thr Tyr Thr Pro ProLeu Ala 95 100 105 Asn Tyr Met Leu Ser Arg Glu His Ser Gln Thr Cys HisArg His 110 115 120 Lys Gly Cys Met Leu Cys Thr Met Gln Ala His Ile ThrArg Ala 125 130 135 Leu His Asn Pro Gly His Val Ile Gln Pro Ser Gln AlaLeu Ala 140 145 150 Ala Gly Phe His Arg Gly Lys Gln Glu Asp Ala His GluPhe Leu 155 160 165 Met Phe Thr Val Asp Ala Met Lys Lys Ala Cys Leu ProGly His 170 175 180 Lys Gln Val Asp His His Ser Lys Asp Thr Thr Leu IleHis Gln 185 190 195 Ile Phe Gly Gly Tyr Trp Arg Ser Gln Ile Lys Cys LeuHis Cys 200 205 210 His Gly Ile Ser Asp Thr Phe Asp Pro Tyr Leu Asp IleAla Leu 215 220 225 Asp Ile Gln Ala Ala Gln Ser Val Gln Gln Ala Leu GluGln Leu 230 235 240 Val Lys Pro Glu Glu Leu Asn Gly Glu Asn Ala Tyr HisCys Gly 245 250 255 Val Cys Leu Gln Arg Ala Pro Ala Ser Lys Thr Leu ThrLeu His 260 265 270 Thr Ser Ala Lys Val Leu Ile Leu Val Leu Lys Arg PheSer Asp 275 280 285 Val Thr Gly Asn Leu Glu Pro Asn Ser Ala Arg Ala ArgAla Glu 290 295 300 Arg Ser Gln Cys Ser Thr Ser Pro Cys Pro Ser Cys ArgGly 305 310 12 437 PRT Homo sapiens misc_feature Incyte ID No 4948403CD112 Met Lys Cys Leu Gly Lys Arg Arg Gly Gln Ala Ala Ala Phe Leu 1 5 10 15Pro Leu Cys Trp Leu Phe Leu Lys Ile Leu Gln Pro Gly His Ser 20 25 30 HisLeu Tyr Asn Asn Arg Tyr Ala Gly Asp Lys Val Ile Arg Phe 35 40 45 Ile ProLys Thr Glu Glu Glu Ala Tyr Ala Leu Lys Lys Ile Ser 50 55 60 Tyr Gln LeuLys Val Asp Leu Trp Gln Pro Ser Ser Ile Ser Tyr 65 70 75 Val Ser Glu GlyThr Val Thr Asp Val His Ile Pro Gln Asn Gly 80 85 90 Ser Arg Ala Leu LeuAla Phe Leu Gln Glu Ala Asn Ile Gln Tyr 95 100 105 Lys Val Leu Ile GluAsp Leu Gln Lys Thr Leu Glu Lys Gly Ser 110 115 120 Ser Leu His Thr GlnArg Asn Arg Arg Ser Leu Ser Gly Tyr Asn 125 130 135 Tyr Glu Val Tyr HisSer Leu Glu Glu Ile Gln Asn Trp Met His 140 145 150 His Leu Asn Lys ThrHis Ser Gly Leu Ile His Met Phe Ser Ile 155 160 165 Gly Arg Ser Tyr GluGly Arg Ser Leu Phe Ile Leu Lys Leu Gly 170 175 180 Arg Arg Ser Arg LeuLys Arg Ala Val Trp Ile Asp Cys Gly Ile 185 190 195 His Ala Arg Glu TrpIle Gly Pro Ala Phe Cys Gln Trp Phe Val 200 205 210 Lys Glu Ala Leu LeuThr Tyr Lys Ser Asp Pro Ala Met Arg Lys 215 220 225 Met Leu Asn His LeuTyr Phe Tyr Ile Met Pro Val Phe Asn Val 230 235 240 Asp Gly Tyr His PheSer Trp Thr Asn Asp Arg Phe Trp Arg Lys 245 250 255 Thr Arg Ser Arg AsnSer Arg Phe Arg Cys Arg Gly Val Asp Ala 260 265 270 Asn Arg Asn Trp LysVal Lys Trp Cys Asp Glu Gly Ala Ser Met 275 280 285 His Pro Cys Asp AspThr Tyr Cys Gly Pro Phe Pro Glu Ser Glu 290 295 300 Pro Glu Val Lys AlaVal Ala Asn Phe Leu Arg Lys His Arg Lys 305 310 315 His Ile Arg Ala TyrLeu Ser Phe His Ala Tyr Ala Gln Met Leu 320 325 330 Leu Tyr Pro Tyr SerTyr Lys Tyr Ala Thr Ile Pro Asn Phe Arg 335 340 345 Cys Val Glu Ser AlaAla Tyr Lys Ala Val Asn Ala Leu Gln Ser 350 355 360 Val Tyr Gly Val ArgTyr Arg Tyr Gly Pro Ala Ser Thr Thr Leu 365 370 375 Tyr Val Ser Ser GlySer Ser Met Asp Trp Ala Tyr Lys Asn Gly 380 385 390 Ile Pro Tyr Ala PheAla Phe Glu Leu Arg Asp Thr Gly Tyr Phe 395 400 405 Gly Phe Leu Leu ProGlu Met Leu Ile Lys Pro Thr Cys Thr Glu 410 415 420 Thr Met Leu Ala ValLys Asn Ile Thr Met His Leu Leu Lys Lys 425 430 435 Cys Pro 13 742 PRTHomo sapiens misc_feature Incyte ID No 7473165CD1 13 Met Val Glu Ser AlaGly Arg Ala Gly Gln Lys Arg Pro Gly Phe 1 5 10 15 Leu Glu Gly Gly LeuLeu Leu Leu Leu Leu Leu Val Thr Ala Ala 20 25 30 Leu Val Ala Leu Gly ValLeu Tyr Ala Asp Arg Arg Gly Ile Pro 35 40 45 Glu Ala Gln Glu Val Ser GluVal Cys Thr Thr Pro Gly Cys Val 50 55 60 Ile Ala Ala Ala Arg Ile Leu GlnAsn Met Asp Pro Thr Thr Glu 65 70 75 Pro Cys Asp Asp Phe Tyr Gln Phe AlaCys Gly Gly Trp Leu Arg 80 85 90 Arg His Val Ile Pro Glu Thr Asn Ser ArgTyr Ser Ile Phe Asp 95 100 105 Val Leu Arg Asp Glu Leu Glu Val Ile LeuLys Ala Val Leu Glu 110 115 120 Asn Ser Thr Ala Lys Asp Arg Pro Ala ValGlu Lys Ala Arg Thr 125 130 135 Leu Tyr Arg Ser Cys Met Asn Gln Ser ValIle Glu Lys Arg Gly 140 145 150 Ser Gln Pro Leu Leu Asp Ile Leu Glu ValVal Gly Gly Trp Pro 155 160 165 Val Ala Met Asp Arg Trp Asn Glu Thr ValGly Leu Glu Trp Glu 170 175 180 Leu Glu Arg Gln Leu Ala Leu Met Asn SerGln Phe Asn Arg Arg 185 190 195 Val Leu Ile Asp Leu Phe Ile Trp Asn AspAsp Gln Asn Ser Ser 200 205 210 Arg His Ile Ile Tyr Ile Asp Gln Pro ThrLeu Gly Met Pro Ser 215 220 225 Arg Glu Tyr Tyr Phe Asn Gly Gly Ser AsnArg Lys Val Arg Glu 230 235 240 Ala Tyr Leu Gln Phe Met Val Ser Val AlaThr Leu Leu Arg Glu 245 250 255 Asp Ala Asn Leu Pro Arg Asp Ser Cys LeuVal Gln Glu Asp Met 260 265 270 Val Gln Val Leu Glu Leu Glu Thr Gln LeuAla Lys Ala Thr Val 275 280 285 Pro Gln Glu Glu Arg His Asp Val Ile AlaLeu Tyr His Arg Met 290 295 300 Gly Leu Glu Glu Leu Gln Ser Gln Phe GlyLeu Lys Gly Phe Asn 305 310 315 Trp Thr Leu Phe Ile Gln Thr Val Leu SerSer Val Lys Ile Lys 320 325 330 Leu Leu Pro Asp Glu Glu Val Val Val TyrGly Ile Pro Tyr Leu 335 340 345 Gln Asn Leu Glu Asn Ile Ile Asp Thr TyrSer Ala Arg Thr Ile 350 355 360 Gln Asn Tyr Leu Val Trp Arg Leu Val LeuAsp Arg Ile Gly Ser 365 370 375 Leu Ser Gln Arg Phe Lys Asp Thr Arg ValAsn Tyr Arg Lys Ala 380 385 390 Leu Phe Gly Thr Met Val Glu Glu Val ArgTrp Arg Glu Cys Val 395 400 405 Gly Tyr Val Asn Ser Asn Met Glu Asn AlaVal Gly Ser Leu Tyr 410 415 420 Val Arg Glu Ala Phe Pro Gly Asp Ser LysSer Met Val Glu Leu 425 430 435 Ile Asp Lys Val Arg Thr Val Phe Val GluThr Leu Asp Glu Leu 440 445 450 Gly Trp Met Asp Glu Glu Ser Lys Lys LysAla Gln Glu Lys Ala 455 460 465 Met Ser Ile Arg Glu Gln Ile Gly His ProAsp Tyr Ile Leu Glu 470 475 480 Glu Met Asn Arg Arg Leu Asp Glu Glu TyrSer Asn Val Asn Phe 485 490 495 Ser Glu Asp Leu Tyr Phe Glu Asn Ser LeuGln Asn Leu Lys Val 500 505 510 Gly Ala Gln Arg Ser Leu Arg Lys Leu ArgGlu Lys Val Asp Pro 515 520 525 Asn Leu Ile Ile Gly Ala Ala Val Val AsnAla Phe Tyr Ser Pro 530 535 540 Asn Arg Asn Gln Ile Val Phe Pro Ala GlyIle Leu Gln Pro Pro 545 550 555 Phe Phe Ser Lys Glu Gln Pro Gln Ala LeuAsn Phe Gly Gly Ile 560 565 570 Gly Met Val Ile Gly His Glu Ile Thr HisGly Phe Asp Asp Asn 575 580 585 Gly Arg Asn Phe Asp Lys Asn Gly Asn MetMet Asp Trp Trp Ser 590 595 600 Asn Phe Ser Thr Gln His Phe Arg Glu GlnSer Glu Cys Met Ile 605 610 615 Tyr Gln Tyr Gly Asn Tyr Ser Trp Asp LeuAla Asp Glu Gln Asn 620 625 630 Val Asn Gly Phe Asn Thr Leu Gly Glu AsnIle Ala Asp Asn Gly 635 640 645 Gly Val Arg Gln Ala Tyr Lys Ala Tyr LeuLys Trp Met Ala Glu 650 655 660 Gly Gly Lys Asp Gln Gln Leu Pro Gly LeuAsp Leu Thr His Glu 665 670 675 Gln Leu Phe Phe Ile Asn Tyr Ala Gln ValTrp Cys Gly Ser Tyr 680 685 690 Arg Pro Glu Phe Ala Ile Gln Ser Ile LysThr Asp Val His Ser 695 700 705 Pro Leu Lys Tyr Arg Val Leu Gly Ser LeuGln Asn Leu Ala Ala 710 715 720 Phe Ala Asp Thr Phe His Cys Ala Arg GlyThr Pro Met His Pro 725 730 735 Lys Glu Arg Cys Arg Val Trp 740 14 582PRT Homo sapiens misc_feature Incyte ID No 7476667CD1 14 Met Phe Thr LeuThr Thr Asn Gly Asp Leu Pro Arg Pro Ile Phe 1 5 10 15 Ile Pro Asn GlyMet Pro Asn Thr Val Val Pro Cys Gly Thr Glu 20 25 30 Lys Asn Phe Thr AsnGly Met Val Asn Gly His Met Pro Ser Leu 35 40 45 Pro Asp Ser Pro Phe ThrGly Tyr Ile Ile Ala Val His Arg Lys 50 55 60 Met Met Arg Thr Glu Leu TyrPhe Leu Ser Ser Gln Lys Asn Arg 65 70 75 Pro Ser Leu Phe Gly Met Pro LeuIle Val Pro Cys Thr Val His 80 85 90 Thr Arg Lys Lys Asp Leu Tyr Asp AlaVal Trp Ile Gln Val Ser 95 100 105 Arg Leu Ala Ser Pro Leu Pro Pro GlnGlu Ala Ser Asn His Ala 110 115 120 Gln Asp Cys Asp Asp Ser Met Gly TyrGln Tyr Pro Phe Thr Leu 125 130 135 Arg Val Val Gln Lys Asp Gly Asn SerCys Ala Trp Cys Pro Trp 140 145 150 Tyr Arg Phe Cys Arg Gly Cys Lys IleAsp Cys Gly Glu Asp Arg 155 160 165 Ala Phe Ile Gly Asn Ala Tyr Ile AlaVal Asp Trp Asp Pro Thr 170 175 180 Ala Leu His Leu Arg Tyr Gln Thr SerGln Glu Arg Val Val Asp 185 190 195 Glu His Glu Ser Val Glu Gln Ser ArgArg Ala Gln Ala Glu Pro 200 205 210 Ile Asn Leu Asp Ser Cys Leu Arg AlaPhe Thr Ser Glu Glu Glu 215 220 225 Leu Gly Glu Asn Glu Met Tyr Tyr CysSer Lys Cys Lys Thr His 230 235 240 Cys Leu Ala Thr Lys Lys Leu Asp LeuTrp Arg Leu Pro Pro Ile 245 250 255 Leu Ile Ile His Leu Lys Arg Phe GlnPhe Val Asn Gly Arg Trp 260 265 270 Ile Lys Ser Gln Lys Ile Val Lys PhePro Arg Glu Ser Phe Asp 275 280 285 Pro Ser Ala Phe Leu Val Pro Arg AspPro Ala Leu Cys Gln His 290 295 300 Lys Pro Leu Thr Pro Gln Gly Asp GluLeu Ser Glu Pro Arg Ile 305 310 315 Leu Ala Arg Glu Val Lys Lys Val AspAla Gln Ser Ser Ala Gly 320 325 330 Glu Glu Asp Val Leu Leu Ser Lys SerPro Ser Ser Leu Ser Ala 335 340 345 Asn Ile Ile Ser Ser Pro Lys Gly SerPro Ser Ser Ser Arg Lys 350 355 360 Ser Gly Thr Ser Cys Pro Ser Ser LysAsn Ser Ser Pro Asn Ser 365 370 375 Ser Pro Arg Thr Leu Gly Arg Ser LysGly Arg Leu Arg Leu Pro 380 385 390 Gln Ile Gly Ser Lys Asn Lys Leu SerSer Ser Lys Glu Asn Leu 395 400 405 Asp Ala Ser Lys Glu Asn Gly Ala GlyGln Ile Cys Glu Leu Ala 410 415 420 Asp Ala Leu Ser Arg Gly His Val LeuGly Gly Ser Gln Pro Glu 425 430 435 Leu Val Thr Pro Gln Asp His Glu ValAla Leu Ala Asn Gly Phe 440 445 450 Leu Tyr Glu His Glu Ala Cys Gly AsnGly Tyr Ser Asn Gly Gln 455 460 465 Leu Gly Asn His Ser Glu Glu Asp SerThr Asp Asp Gln Arg Glu 470 475 480 Asp Thr Arg Ile Lys Pro Ile Tyr AsnLeu Tyr Ala Ile Ser Cys 485 490 495 His Ser Gly Ile Leu Gly Gly Gly HisTyr Val Thr Tyr Ala Lys 500 505 510 Asn Pro Asn Cys Lys Trp Tyr Cys TyrAsn Asp Ser Ser Cys Lys 515 520 525 Glu Leu His Pro Asp Glu Ile Asp ThrAsp Ser Ala Tyr Ile Leu 530 535 540 Phe Tyr Glu Gln Gln Gly Ile Asp TyrAla Gln Phe Leu Pro Lys 545 550 555 Thr Asp Gly Lys Lys Met Ala Asp ThrSer Ser Met Asp Glu Asp 560 565 570 Phe Glu Ser Asp Tyr Lys Lys Tyr CysVal Leu Gln 575 580 15 290 PRT Homo sapiens misc_feature Incyte ID No7479166CD1 15 Met Leu Ser Pro Pro Gln Pro Arg Thr Pro Asp Cys Arg LeuGln 1 5 10 15 Ala Ser Leu Glu Ala Leu Ala Thr Leu Ala Pro Gln Pro SerAsp 20 25 30 Trp Leu Cys Phe Ala Asp Leu Gly Trp Phe Glu Ala Asp Gly Ala35 40 45 Ala His Ser Met Gly Leu Gly Ser Ser Leu Lys Trp Ala Trp Ala 5055 60 Lys Pro Ser Gly Met Pro Val Pro Glu Asn Asp Leu Val Gly Ile 65 7075 Val Gly Gly His Asn Ala Pro Pro Gly Lys Trp Pro Trp Gln Val 80 85 90Ser Leu Arg Val Tyr Ser Tyr His Trp Ala Ser Trp Ala His Ile 95 100 105Cys Gly Gly Ser Leu Ile His Pro Gln Trp Val Leu Thr Ala Ala 110 115 120His Cys Ile Phe Trp Lys Asp Thr Asp Pro Ser Ile Tyr Arg Ile 125 130 135His Ala Gly Asp Val Tyr Leu Tyr Gly Gly Arg Gly Leu Leu Asn 140 145 150Val Ser Arg Ile Ile Val His Pro Asn Tyr Val Thr Ala Gly Leu 155 160 165Gly Ala Asp Val Ala Leu Leu Gln Leu Pro Gly Ser Pro Leu Ser 170 175 180Pro Glu Ser Leu Pro Pro Pro Tyr Arg Leu Gln Gln Ala Ser Val 185 190 195Gln Val Leu Glu Asn Ala Val Cys Glu Gln Pro Tyr Arg Asn Ala 200 205 210Ser Gly His Thr Gly Asp Arg Gln Leu Ile Leu Asp Asp Met Leu 215 220 225Cys Ala Gly Ser Glu Gly Arg Asp Ser Cys Tyr Gly Asp Ser Gly 230 235 240Gly Pro Leu Val Cys Arg Leu Arg Gly Ser Trp Arg Leu Val Gly 245 250 255Val Val Ser Trp Gly Tyr Gly Cys Thr Leu Arg Asp Phe Pro Gly 260 265 270Val Tyr Thr His Val Gln Ile Tyr Val Leu Trp Ile Leu Gln Gln 275 280 285Val Gly Glu Leu Pro 290 16 708 PRT Homo sapiens misc_feature Incyte IDNo 3671788CD1 16 Met Ala Ser Ser Ser Gly Arg Val Thr Ile Gln Leu Val AspGlu 1 5 10 15 Glu Ala Gly Val Gly Ala Gly Arg Leu Gln Leu Phe Arg GlyGln 20 25 30 Ser Tyr Glu Ala Ile Arg Ala Ala Cys Leu Asp Ser Gly Ile Leu35 40 45 Phe Arg Asp Pro Tyr Phe Pro Ala Gly Pro Asp Ala Leu Gly Tyr 5055 60 Asp Gln Leu Gly Pro Asp Ser Glu Lys Ala Lys Gly Val Lys Trp 65 7075 Met Arg Pro His Glu Phe Cys Ala Glu Pro Lys Phe Ile Cys Glu 80 85 90Asp Met Ser Arg Thr Asp Val Cys Gln Gly Ser Leu Gly Asn Cys 95 100 105Trp Phe Leu Ala Ala Ala Ala Ser Leu Thr Leu Tyr Pro Arg Leu 110 115 120Leu Arg Arg Val Val Pro Pro Gly Gln Asp Phe Gln His Gly Tyr 125 130 135Ala Gly Val Phe His Phe Gln Leu Trp Gln Phe Gly Arg Trp Met 140 145 150Asp Val Val Val Asp Asp Arg Leu Pro Val Arg Glu Gly Lys Leu 155 160 165Met Phe Val Arg Ser Glu Gln Arg Asn Glu Phe Trp Ala Pro Leu 170 175 180Leu Glu Lys Ala Tyr Ala Lys Leu His Gly Ser Tyr Glu Val Met 185 190 195Arg Gly Gly His Met Asn Glu Ala Phe Val Asp Phe Thr Gly Gly 200 205 210Val Gly Glu Val Leu Tyr Leu Arg Gln Asn Ser Met Gly Leu Phe 215 220 225Ser Ala Leu Arg His Ala Leu Ala Lys Glu Ser Leu Val Gly Ala 230 235 240Thr Ala Leu Ser Asp Arg Gly Glu Tyr Arg Thr Glu Glu Gly Leu 245 250 255Val Lys Gly His Ala Tyr Ser Ile Thr Gly Thr His Lys Val Phe 260 265 270Leu Gly Phe Thr Lys Val Arg Leu Leu Arg Leu Arg Asn Pro Trp 275 280 285Gly Cys Val Glu Trp Thr Gly Ala Trp Ser Asp Ser Cys Pro Arg 290 295 300Trp Asp Thr Leu Pro Thr Glu Cys Arg Asp Ala Leu Leu Val Lys 305 310 315Lys Glu Asp Gly Glu Phe Trp Met Glu Leu Arg Asp Phe Leu Leu 320 325 330His Phe Asp Thr Val Gln Ile Cys Ser Leu Ser Pro Glu Val Leu 335 340 345Gly Pro Ser Pro Glu Gly Gly Gly Trp His Val His Thr Phe Gln 350 355 360Gly Arg Trp Val Arg Gly Phe Asn Ser Gly Gly Ser Gln Pro Asn 365 370 375Ala Glu Thr Phe Trp Thr Asn Pro Gln Phe Arg Leu Thr Leu Leu 380 385 390Glu Pro Asp Glu Glu Asp Asp Glu Asp Glu Glu Gly Pro Trp Gly 395 400 405Gly Trp Gly Ala Ala Gly Ala Arg Gly Pro Ala Arg Gly Gly Arg 410 415 420Thr Pro Lys Cys Thr Val Leu Leu Ser Leu Ile Gln Arg Asn Arg 425 430 435Arg Arg Leu Arg Ala Lys Gly Leu Thr Tyr Leu Thr Val Gly Phe 440 445 450His Val Phe Gln Ala Glu Gly Ser Thr Gly Thr Asp Asn Glu Arg 455 460 465Thr His Gly Phe Thr Gly His Arg Gly Ala Gln Leu Ala Gly His 470 475 480Thr His Gly Pro Gln Glu Ala Ser Lys Arg Tyr Thr Gln Asn Ser 485 490 495Ala Glu Val Ala Pro Asp Arg Glu Ala Asp Asp Asp Gly Gly Gln 500 505 510Gly Phe Gly Asp Gly Pro Trp Glu Ile Asp Asp Val Ile Ser Ala 515 520 525Asp Leu Gln Ser Leu Gln Gly Pro Tyr Leu Pro Leu Glu Leu Gly 530 535 540Leu Glu Gln Leu Phe Gln Glu Leu Ala Gly Glu Glu Glu Glu Leu 545 550 555Asn Ala Ser Gln Leu Gln Ala Leu Leu Ser Ile Ala Leu Glu Pro 560 565 570Ala Arg Ala His Thr Ser Thr Pro Arg Glu Ile Gly Leu Arg Thr 575 580 585Cys Glu Gln Leu Leu Gln Cys Phe Gly His Gly Gln Ser Leu Ala 590 595 600Leu His His Phe Gln Gln Leu Trp Gly Tyr Leu Leu Glu Trp Gln 605 610 615Ala Ile Phe Asn Lys Phe Asp Glu Asp Thr Ser Gly Thr Met Asn 620 625 630Ser Tyr Glu Leu Arg Leu Ala Leu Asn Ala Ala Gly Phe His Leu 635 640 645Asn Asn Gln Leu Thr Gln Thr Leu Thr Ser Arg Tyr Arg Asp Ser 650 655 660Arg Leu Arg Val Asp Phe Glu Arg Phe Val Ser Cys Val Ala His 665 670 675Leu Thr Cys Ile Phe Cys His Cys Ser Gln His Leu Asp Gly Gly 680 685 690Glu Gly Val Ile Cys Leu Thr His Arg Gln Trp Met Glu Val Ala 695 700 705Thr Phe Ser 17 649 PRT Homo sapiens misc_feature Incyte ID No 7479181CD117 Met Glu Leu Gly Cys Trp Thr Gln Leu Gly Leu Thr Phe Leu Gln 1 5 10 15Leu Leu Leu Ile Ser Ser Leu Pro Arg Glu Tyr Thr Val Ile Asn 20 25 30 GluAla Cys Pro Gly Ala Glu Trp Asn Ile Met Cys Arg Glu Cys 35 40 45 Cys GluTyr Asp Gln Ile Glu Cys Val Cys Pro Gly Lys Arg Glu 50 55 60 Val Val GlyTyr Thr Ile Pro Cys Cys Arg Asn Glu Glu Asn Glu 65 70 75 Cys Asp Ser CysLeu Ile His Pro Gly Cys Thr Ile Phe Glu Asn 80 85 90 Cys Lys Ser Cys ArgAsn Gly Ser Trp Gly Gly Thr Leu Asp Asp 95 100 105 Phe Tyr Val Lys GlyPhe Tyr Cys Ala Glu Cys Arg Ala Gly Trp 110 115 120 Tyr Gly Gly Asp CysMet Arg Cys Gly Gln Val Leu Arg Ala Pro 125 130 135 Lys Gly Gln Ile LeuLeu Glu Ser Tyr Pro Leu Asn Ala His Cys 140 145 150 Glu Trp Thr Ile HisAla Lys Pro Gly Phe Val Ile Gln Leu Arg 155 160 165 Phe Val Met Leu SerLeu Glu Phe Asp Tyr Met Cys Gln Tyr Asp 170 175 180 Tyr Val Glu Val ArgAsp Gly Asp Asn Arg Asp Gly Gln Ile Ile 185 190 195 Lys Arg Val Cys GlyAsn Glu Arg Pro Ala Pro Ile Gln Ser Ile 200 205 210 Gly Ser Ser Leu HisVal Leu Phe His Ser Asp Gly Ser Lys Asn 215 220 225 Phe Asp Gly Phe HisAla Ile Tyr Glu Glu Ile Thr Ala Cys Ser 230 235 240 Ser Ser Pro Cys PheHis Asp Gly Thr Cys Val Leu Asp Lys Ala 245 250 255 Gly Ser Tyr Lys CysAla Cys Leu Ala Gly Tyr Thr Gly Gln Arg 260 265 270 Cys Glu Asn Pro CysArg Glu Pro Lys Ile Ser Asp Leu Val Arg 275 280 285 Arg Arg Val Leu ProMet Gln Val Gln Ser Arg Glu Thr Pro Leu 290 295 300 His Gln Leu Tyr SerAla Ala Phe Ser Lys Gln Lys Leu Gln Ser 305 310 315 Ala Pro Thr Lys LysPro Ala Leu Pro Phe Gly Asp Leu Pro Met 320 325 330 Gly Tyr Gln His LeuHis Thr Gln Leu Gln Tyr Glu Cys Ile Ser 335 340 345 Pro Phe Tyr Arg ArgLeu Gly Ser Ser Arg Arg Thr Cys Leu Arg 350 355 360 Thr Gly Lys Trp SerGly Arg Ala Pro Ser Cys Ile Pro Ile Cys 365 370 375 Gly Lys Ile Glu AsnIle Thr Ala Pro Lys Thr Gln Gly Leu Arg 380 385 390 Trp Pro Trp Gln AlaAla Ile Tyr Arg Arg Thr Ser Gly Val His 395 400 405 Asp Gly Ser Leu HisLys Gly Ala Trp Phe Leu Val Cys Ser Gly 410 415 420 Ala Leu Val Asn GluArg Thr Val Val Val Ala Ala His Cys Val 425 430 435 Thr Asp Leu Gly LysVal Thr Met Ile Lys Thr Ala Asp Leu Lys 440 445 450 Val Val Leu Gly LysPhe Tyr Arg Asp Asp Asp Arg Asp Glu Lys 455 460 465 Thr Ile Gln Ser LeuGln Ile Ser Ala Ile Ile Leu His Pro Asn 470 475 480 Tyr Asp Pro Ile LeuLeu Asp Ala Asp Ile Ala Ile Leu Lys Leu 485 490 495 Leu Asp Lys Ala ArgIle Ser Thr Arg Val Gln Pro Ile Cys Leu 500 505 510 Ala Ala Ser Arg AspLeu Ser Thr Ser Phe Gln Glu Ser His Ile 515 520 525 Thr Val Ala Gly TrpAsn Val Leu Ala Asp Val Arg Ser Pro Gly 530 535 540 Phe Lys Asn Asp ThrLeu Arg Ser Gly Val Val Ser Val Val Asp 545 550 555 Ser Leu Leu Cys GluGlu Gln His Glu Asp His Gly Ile Pro Val 560 565 570 Ser Val Thr Asp AsnMet Phe Cys Ala Ser Trp Glu Pro Thr Ala 575 580 585 Pro Ser Asp Ile CysThr Ala Glu Thr Gly Gly Ile Ala Ala Val 590 595 600 Ser Phe Pro Gly ArgAla Ser Pro Glu Pro Arg Trp His Leu Met 605 610 615 Gly Leu Val Ser TrpSer Tyr Asp Lys Thr Cys Ser His Arg Leu 620 625 630 Ser Thr Ala Phe ThrLys Val Leu Pro Phe Lys Asp Trp Ile Glu 635 640 645 Arg Asn Met Lys 18918 PRT Homo sapiens misc_feature Incyte ID No 6621372CD1 18 Met Pro GlyGly Ala Gly Ala Ala Arg Leu Cys Leu Leu Ala Phe 1 5 10 15 Ala Leu GlnPro Leu Arg Pro Arg Ala Ala Arg Glu Pro Gly Trp 20 25 30 Thr Arg Gly SerGlu Glu Gly Ser Pro Lys Leu Gln His Glu Leu 35 40 45 Ile Ile Pro Gln TrpLys Thr Ser Glu Ser Pro Val Arg Glu Lys 50 55 60 His Pro Leu Lys Ala GluLeu Arg Val Met Ala Glu Gly Arg Glu 65 70 75 Leu Ile Leu Asp Leu Glu LysAsn Glu Gln Leu Phe Ala Pro Ser 80 85 90 Tyr Thr Glu Thr His Tyr Thr SerSer Gly Asn Pro Gln Thr Thr 95 100 105 Thr Arg Lys Leu Glu Asp His CysPhe Tyr His Gly Thr Val Arg 110 115 120 Glu Thr Glu Leu Ser Ser Val ThrLeu Ser Thr Cys Arg Gly Ile 125 130 135 Arg Gly Leu Ile Thr Val Ser SerAsn Leu Ser Tyr Val Ile Glu 140 145 150 Pro Leu Pro Asp Ser Lys Gly GlnHis Leu Ile Tyr Arg Ser Glu 155 160 165 His Leu Lys Pro Pro Pro Gly AsnCys Gly Phe Glu His Ser Lys 170 175 180 Pro Thr Thr Arg Asp Trp Ala LeuGln Phe Thr Gln Gln Thr Lys 185 190 195 Lys Arg Pro Arg Arg Met Lys ArgGlu Asp Leu Asn Ser Met Lys 200 205 210 Tyr Val Glu Leu Tyr Leu Val AlaAsp Tyr Leu Glu Phe Gln Lys 215 220 225 Asn Arg Arg Asp Gln Asp Ala ThrLys His Lys Leu Ile Glu Ile 230 235 240 Ala Asn Tyr Val Asp Lys Phe TyrArg Ser Leu Asn Ile Arg Ile 245 250 255 Ala Leu Val Gly Leu Glu Val TrpThr His Gly Asn Met Cys Glu 260 265 270 Val Ser Glu Asn Pro Tyr Ser ThrLeu Trp Ser Phe Leu Ser Trp 275 280 285 Arg Arg Lys Leu Leu Ala Gln LysTyr His Asp Asn Ala Gln Leu 290 295 300 Ile Thr Gly Met Ser Phe His GlyThr Thr Ile Gly Leu Ala Pro 305 310 315 Leu Met Ala Met Cys Ser Val TyrGln Ser Gly Gly Val Asn Met 320 325 330 Asp His Ser Glu Asn Ala Ile GlyVal Ala Ala Thr Met Ala His 335 340 345 Glu Met Gly His Asn Phe Gly MetThr His Asp Ser Ala Asp Cys 350 355 360 Cys Ser Ala Ser Ala Ala Asp GlyGly Cys Ile Met Ala Ala Ala 365 370 375 Thr Gly His Pro Phe Pro Lys ValPhe Asn Gly Cys Asn Arg Arg 380 385 390 Glu Leu Asp Arg Tyr Leu Gln SerGly Gly Gly Met Cys Leu Ser 395 400 405 Asn Met Pro Asp Thr Arg Met LeuTyr Gly Gly Arg Arg Cys Gly 410 415 420 Asn Gly Tyr Leu Glu Asp Gly GluGlu Cys Asp Cys Gly Glu Glu 425 430 435 Glu Glu Cys Asn Asn Pro Cys CysAsn Ala Ser Asn Cys Thr Leu 440 445 450 Arg Pro Gly Ala Glu Cys Ala HisGly Ser Cys Cys His Gln Cys 455 460 465 Lys Leu Leu Ala Pro Gly Thr LeuCys Arg Glu Gln Ala Arg Gln 470 475 480 Cys Asp Leu Pro Glu Phe Cys ThrGly Lys Ser Pro His Cys Pro 485 490 495 Thr Asn Phe Tyr Gln Met Asp GlyThr Pro Cys Glu Gly Gly Gln 500 505 510 Ala Tyr Cys Tyr Asn Gly Met CysLeu Thr Tyr Gln Glu Gln Cys 515 520 525 Gln Gln Leu Trp Gly Pro Gly AlaArg Pro Ala Pro Asp Leu Cys 530 535 540 Phe Glu Lys Val Asn Val Ala GlyAsp Thr Phe Gly Asn Cys Gly 545 550 555 Lys Asp Met Asn Gly Glu His ArgLys Cys Asn Met Arg Asp Ala 560 565 570 Lys Cys Gly Lys Ile Gln Cys GlnSer Ser Glu Ala Arg Pro Leu 575 580 585 Glu Ser Asn Ala Val Pro Ile AspThr Thr Ile Ile Met Asn Gly 590 595 600 Arg Gln Ile Gln Cys Arg Gly ThrHis Val Tyr Arg Gly Pro Glu 605 610 615 Glu Glu Gly Asp Met Leu Asp ProGly Leu Val Met Thr Gly Thr 620 625 630 Lys Cys Gly Tyr Asn His Ile CysPhe Glu Gly Gln Cys Arg Asn 635 640 645 Thr Ser Phe Phe Glu Thr Glu GlyCys Gly Lys Lys Cys Asn Gly 650 655 660 His Gly Val Cys Asn Asn Asn GlnAsn Cys His Cys Leu Pro Gly 665 670 675 Trp Ala Pro Pro Phe Cys Asn ThrPro Gly His Gly Gly Ser Ile 680 685 690 Asp Ser Gly Pro Met Pro Pro GluSer Val Gly Pro Val Val Ala 695 700 705 Gly Val Leu Val Ala Ile Leu ValLeu Ala Val Leu Met Leu Met 710 715 720 Tyr Tyr Cys Cys Arg Gln Asn AsnLys Leu Gly Gln Leu Lys Pro 725 730 735 Ser Ala Leu Pro Ser Lys Leu ArgGln Gln Phe Ser Cys Pro Phe 740 745 750 Arg Val Ser Gln Asn Ser Gly ThrGly His Ala Asn Pro Thr Phe 755 760 765 Lys Leu Gln Thr Pro Gln Gly LysArg Lys Val Ile Asn Thr Pro 770 775 780 Glu Ile Leu Arg Lys Pro Ser GlnPro Pro Pro Arg Pro Pro Pro 785 790 795 Asp Tyr Leu Arg Gly Gly Ser ProPro Ala Pro Leu Pro Ala His 800 805 810 Leu Ser Arg Ala Ala Arg Asn SerPro Gly Pro Gly Ser Gln Ile 815 820 825 Glu Arg Thr Glu Ser Ser Arg ArgPro Pro Pro Ser Arg Pro Ile 830 835 840 Pro Pro Ala Pro Asn Cys Ile ValSer Gln Asp Phe Ser Arg Pro 845 850 855 Arg Pro Pro Gln Lys Ala Leu ProAla Asn Pro Val Pro Gly Arg 860 865 870 Arg Ser Leu Pro Arg Pro Gly GlyAla Ser Pro Leu Arg Pro Pro 875 880 885 Gly Ala Gly Pro Gln Gln Ser ArgPro Leu Ala Ala Leu Ala Pro 890 895 900 Lys Phe Pro Glu Tyr Arg Ser GlnArg Ala Gly Gly Met Ile Ser 905 910 915 Ser Lys Ile 19 218 PRT Homosapiens misc_feature Incyte ID No 4847254CD1 19 Met Arg Gln Gly Pro TyrLeu Pro Leu Glu Leu Gly Leu Glu Gln 1 5 10 15 Leu Phe Gln Glu Leu AlaGly Glu Glu Glu Glu Leu Asn Ala Ser 20 25 30 Gln Leu Gln Ala Leu Leu SerIle Ala Leu Glu Pro Ala Arg Ala 35 40 45 His Thr Ser Thr Pro Arg Glu IleGly Leu Arg Thr Cys Glu Gln 50 55 60 Leu Leu Gln Cys Phe Gly Val His GlyGly Gln Cys Leu Gly Glu 65 70 75 Gly Gly Ser Gly Glu Gly Asp Val Gly ValSer Pro Pro Leu Leu 80 85 90 Glu Arg Leu Thr Leu Thr Arg Cys Pro Arg ProPro Thr Gln His 95 100 105 Gly Gln Ser Leu Ala Leu His His Phe Gln GlnLeu Trp Gly Tyr 110 115 120 Leu Leu Glu Trp Gln Ala Ile Phe Asn Lys PheAsp Glu Asp Thr 125 130 135 Ser Gly Thr Met Asn Ser Tyr Glu Leu Arg LeuAla Leu Asn Ala 140 145 150 Ala Gly Phe His Leu Asn Asn Gln Leu Thr GlnThr Leu Thr Ser 155 160 165 Arg Tyr Arg Asp Ser Arg Leu Arg Val Asp PheGlu Arg Phe Val 170 175 180 Ser Cys Val Ala His Leu Thr Cys Ile Phe CysHis Cys Ser Gln 185 190 195 His Leu Asp Gly Gly Glu Gly Val Ile Cys LeuThr His Arg Gln 200 205 210 Trp Met Glu Val Ala Thr Phe Ser 215 20 656PRT Homo sapiens misc_feature Incyte ID No 5776350CD1 20 Met Lys Leu GluPro Leu Gln Glu Arg Glu Pro Ala Pro Glu Glu 1 5 10 15 Asn Leu Thr TrpSer Ser Ser Gly Gly Asp Glu Lys Val Leu Pro 20 25 30 Ser Ile Pro Leu ArgCys His Ser Ser Ser Ser Pro Val Cys Pro 35 40 45 Arg Arg Lys Pro Arg ProArg Pro Gln Pro Arg Ala Arg Ser Arg 50 55 60 Ser Gln Pro Gly Leu Ser AlaPro Pro Pro Pro Pro Ala Arg Pro 65 70 75 Pro Pro Pro Pro Pro Pro Pro ProPro Pro Ala Pro Arg Pro Arg 80 85 90 Ala Trp Arg Gly Ser Arg Arg Arg SerArg Pro Gly Ser Arg Pro 95 100 105 Gln Thr Arg Arg Ser Cys Ser Gly AspLeu Asp Gly Ser Gly Asp 110 115 120 Pro Gly Gly Leu Gly Asp Trp Leu LeuGlu Val Glu Phe Gly Gln 125 130 135 Gly Pro Thr Gly Cys Ser His Val GluSer Phe Lys Val Gly Lys 140 145 150 Asn Trp Gln Lys Asn Leu Arg Leu IleTyr Gln Arg Phe Val Trp 155 160 165 Ser Gly Thr Pro Glu Thr Arg Lys ArgLys Ala Lys Ser Cys Ile 170 175 180 Cys His Val Cys Ser Thr His Met AsnArg Leu His Ser Cys Leu 185 190 195 Ser Cys Val Phe Phe Gly Cys Phe ThrGlu Lys His Ile His Lys 200 205 210 His Ala Glu Thr Lys Gln His His LeuAla Val Asp Leu Tyr His 215 220 225 Gly Val Ile Tyr Cys Phe Met Cys LysAsp Tyr Val Tyr Asp Lys 230 235 240 Asp Ile Glu Gln Ile Ala Lys Glu ThrLys Glu Lys Ile Leu Arg 245 250 255 Leu Leu Thr Ser Thr Ser Thr Asp ValSer His Gln Gln Phe Met 260 265 270 Thr Ser Gly Phe Glu Asp Lys Gln SerThr Cys Glu Thr Lys Glu 275 280 285 Gln Glu Pro Lys Leu Val Lys Pro LysLys Lys Arg Arg Lys Lys 290 295 300 Ser Val Tyr Thr Val Gly Leu Arg GlyLeu Ile Asn Leu Gly Asn 305 310 315 Thr Cys Phe Met Asn Cys Ile Val GlnAla Leu Thr His Ile Pro 320 325 330 Leu Leu Lys Asp Phe Phe Leu Ser AspLys His Lys Cys Ile Met 335 340 345 Thr Ser Pro Ser Leu Cys Leu Val CysGlu Met Ser Ser Leu Phe 350 355 360 His Ala Met Tyr Ser Gly Ser Arg ThrPro His Ile Pro Tyr Lys 365 370 375 Leu Leu His Leu Ile Trp Ile His AlaGlu His Leu Ala Gly Tyr 380 385 390 Arg Gln Gln Asp Ala His Glu Phe LeuIle Ala Ile Leu Asp Val 395 400 405 Leu His Arg His Ser Lys Asp Asp SerGly Gly Gln Glu Ala Asn 410 415 420 Asn Pro Asn Cys Cys Asn Cys Ile IleAsp Gln Ile Phe Thr Gly 425 430 435 Gly Leu Gln Ser Asp Val Thr Cys GlnAla Cys His Ser Val Ser 440 445 450 Thr Thr Ile Asp Pro Cys Trp Asp IleSer Leu Asp Leu Pro Gly 455 460 465 Ser Cys Ala Thr Phe Asp Ser Gln AsnPro Glu Arg Ala Asp Ser 470 475 480 Thr Val Ser Arg Asp Asp His Ile ProGly Ile Pro Ser Leu Thr 485 490 495 Asp Cys Leu Gln Trp Phe Thr Arg ProGlu His Leu Gly Ser Ser 500 505 510 Ala Lys Ile Lys Cys Asn Ser Cys GlnSer Tyr Gln Glu Ser Thr 515 520 525 Lys Gln Leu Thr Met Lys Lys Leu ProIle Val Ala Cys Phe His 530 535 540 Leu Lys Arg Phe Glu His Val Gly LysGln Arg Arg Lys Ile Asn 545 550 555 Thr Phe Ile Ser Phe Pro Leu Glu LeuAsp Met Thr Pro Phe Leu 560 565 570 Ala Ser Thr Lys Glu Ser Arg Met LysGlu Gly Gln Pro Pro Thr 575 580 585 Asp Cys Val Pro Asn Glu Asn Lys TyrSer Leu Phe Ala Val Ile 590 595 600 Asn His His Gly Thr Leu Glu Ser GlyHis Tyr Thr Ser Phe Ile 605 610 615 Arg Gln Gln Lys Asp Gln Trp Phe SerCys Asp Asp Ala Ile Ile 620 625 630 Thr Lys Ala Thr Ile Glu Asp Leu LeuTyr Ser Glu Gly Tyr Leu 635 640 645 Leu Phe Tyr His Lys Gln Gly Leu GluLys Asp 650 655 21 509 PRT Homo sapiens misc_feature Incyte ID No7473300CD1 21 Met Leu Leu Thr Gln Ser Leu Phe Gly Gly Leu Phe Thr ArgThr 1 5 10 15 Arg Glu Thr Val Cys Ile Phe Gln Pro Trp Thr Gln Gln ArgVal 20 25 30 Thr Thr Asn Arg Ser Trp Thr His Pro Glu Thr Gln Ala Glu Arg35 40 45 Leu Trp Ile Lys Gln Glu Thr Glu Asp Arg Asp Arg Ser Ser Phe 5055 60 Tyr Ile Gln Met Asn Lys Gly Arg Pro Trp Val Tyr Leu Lys Tyr 65 7075 Gln Ile Val Gly Ala Trp Ile Gln Pro Glu Leu Asp Val Ile His 80 85 90Ser Phe Ile Gln Ser Glu Thr Phe Leu Leu Arg Phe Trp Pro Lys 95 100 105Val Leu Ser Pro Val Val Lys Pro Trp Ile Leu Leu Lys Gly Arg 110 115 120Thr Leu Ile Ser Trp Ile Leu Pro Val Thr Arg Ala Asp Thr Gly 125 130 135Ser Ser Leu Lys Phe Ile Leu Leu Asn Pro Ser Val Phe Leu Lys 140 145 150Pro Ala Asn His Leu Ser Thr Trp Asp Arg Arg His Thr Leu Leu 155 160 165His Leu Asp Asn Phe Val Val Val Val Leu Ala Val Glu Ser Pro 170 175 180Gly Ile Val Gln Lys Arg His Leu Ser Ile Leu Gln Val Ser Thr 185 190 195Cys Ala Gln Phe Trp Leu Lys Leu Asn Glu Leu Thr Phe Trp Val 200 205 210Glu Ala Lys Lys Ala Met Trp Met Ala Asp Tyr Gln Gly Val Thr 215 220 225Gln Ser Ser Tyr Ala Pro Trp Tyr Lys Gln Gly Pro Met Thr Thr 230 235 240Ser Ala Ser Met Ser His Ser Val Ser Thr Ser Thr Asn Ala Ser 245 250 255Ala Phe Thr Ser Thr Pro Ala Ser Leu Trp Pro His Phe Ser Leu 260 265 270Pro Gln Pro Gln Ser Lys Ala Gln Lys Leu Gly Arg Asp Gln Ile 275 280 285Tyr Leu Arg Tyr Ala Met Pro Trp Lys Ala Val Ile Ile Ile Cys 290 295 300Gly Ser Gln Ile Cys Ser Gly Ser Ile Val Gly Ser Ser Trp Ile 305 310 315Leu Thr Ala Ala His Cys Val Arg Lys Leu Arg Asp Pro Glu Asp 320 325 330Thr Ala Val Ile Leu Gly Leu Arg His Pro Gly Ala Pro Leu Arg 335 340 345Val Val Lys Val Ser Thr Ile Leu Leu His Glu Arg Phe Trp Leu 350 355 360Val Thr Glu Ala Ala Arg Asn Ile Leu Glu Leu Leu Leu Leu His 365 370 375Asp Val Gln Thr Pro Ile Trp Leu Leu Ser Leu Leu Gly Tyr Leu 380 385 390Arg Asn Leu Asn Ser Ser Glu Cys Trp Leu Ser Arg Pro His Ile 395 400 405Val Thr Pro Ala Val Leu Leu Arg His Pro Trp Ala Pro Gly Gly 410 415 420Pro Gln Pro His Pro Gly Thr Gly Pro Leu Pro Gln Ile Gln Ala 425 430 435Gln Gln Pro Asn Leu Gln Ile His His Val Ala Gln Gln Asp Phe 440 445 450Ile Ile Cys Asp Pro Gly Pro Tyr Leu Gly Pro Ser Leu Glu His 455 460 465His Val Phe Leu Gly Trp Leu Pro Ala Thr Leu Leu Leu Gly Pro 470 475 480Arg Arg Pro Pro Pro Ala Ala Ser His Pro Glu Leu Ala Ala Ala 485 490 495Lys Thr Trp Leu Trp Pro Gly Asn Arg Gly Cys Pro Val Ala 500 505 22 2789DNA Homo sapiens misc_feature Incyte ID No 5155802CB1 22 ctctttctctctccctctgg catgcatgct gctggtagga gacccccaag tcaacattgc 60 ttcagaaatcctttagcact catttctcag gagaacttat ggcttcagaa tcacagctcg 120 gtttttaagatggacataac ctgtacgacc ttctgatggg ctttcaactt tgaactggat 180 gtggacacttttctctcaga tgacagaatt actccaactt cccctttgca gttgcttcct 240 ttccttgaaggtagctgtat cttattttct ttaaaaagct ttttcttcca aagccacttg 300 ccatgccgaccgtcattagc gcatctgtgg ctccaaggac agcggctgag ccccggtccc 360 cagggccagttcctcacccg gcccagagca aggccactga ggctgggggt ggaaacccaa 420 gtggcatctattcagccatc atcagccgca attttcctat tatcggagtg aaagagaaga 480 cattcgagcaacttcacaag aaatgtctag aaaagaaagt tctttatgtg gaccctgagt 540 tcccaccggatgagacctct ctcttttata gccagaagtt ccccatccag ttcgtctgga 600 agagacctccggaaatttgc gagaatcccc gatttatcat tgatggagcc aacagaactg 660 acatctgtcaaggagagcta ggggactgct ggtttctcgc agccattgcc tgcctgaccc 720 tgaaccagcaccttcttttc cgagtcatac cccatgatca aagtttcatc gaaaactacg 780 cagggatcttccacttccag ttctggcgct atggagagtg ggtggacgtg gttatagatg 840 actgcctgccaacgtacaac aatcaactgg ttttcaccaa gtccaaccac cgcaatgagt 900 tctggagtgctctgctggag aaggcttatg ctaagctcca tggttcctac gaagctctga 960 aaggtgggaacaccacagag gccatggagg acttcacagg aggggtgaca gagttttttg 1020 agatcagggatgctcctagt gacatgtaca agatcatgaa gaaagccatc gagagaggct 1080 ccctcatgggctgctccatt gatacaatca ttccggttca gtatgagaca agaatggcct 1140 gcgggctggtcagaggtcac gcctactctg tcacggggct ggatgaggtc ccgttcaaag 1200 gtgagaaagtgaagctggtg cggctgcgga atccgtgggg ccaggtggag tggaacggtt 1260 cttggagtgatagatggaag gactggagct ttgtggacaa agatgagaag gcccgtctgc 1320 agcaccaggtcactgaggat ggagagttct ggatgtccta tgaggatttc atctaccatt 1380 tcacaaagttggagatctgc aacctcacgg ccgatgctct gcagtctgac aagcttcaga 1440 cctggacagtgtctgtgaac gagggccgct gggtacgggg ttgctctgcc ggaggctgcc 1500 gcaacttcccagatactttc tggaccaacc ctcagtaccg tctgaagctc ctggaggagg 1560 acgatgaccctgatgactcg gaggtgattt gcagcttcct ggtggccctg atgcagaaga 1620 accggcggaaggaccggaag ctaggggcca gtctcttcac cattggcttc gccatctacg 1680 aggttcccaaagagatgcac gggaacaagc agcacctgca gaaggacttc ttcctgtaca 1740 acgcctccaaggccaggagc aaaacctaca tcaacatgcg ggaggtgtcc cagcgcttcc 1800 gcctgcctcccagcgagtac gtcatcgtgc cctccaccta cgagccccac caggaggggg 1860 aattcatcctccgggtcttc tctgaaaaga ggaacctctc tgaggaagtt gaaaatacca 1920 tctccgtggatcggccagtg cccatcatct tcgtttcgga cagagcaaac agcaacaagg 1980 agctgggtgtggaccaggag tcagaggagg gcaaaggcaa aacaagccct gataagcaaa 2040 agcagtccccacagccacag cctggcagct ctgatcagga aagtgaggaa cagcaacaat 2100 tccggaacattttcaagcag atagcaggag atgacatgga gatctgtgca gatgagctca 2160 agaaggtccttaacacagtc gtgaacaaac acaaggacct gaagacacac gggttcacac 2220 tggagtcctgccgtagcatg attgcgctca tggatacaga tggctctgga aagctcaacc 2280 tgcaggagttccaccacctc tggaacaaga ttaaggcctg gcagaaaatt ttcaaacact 2340 atgacacagaccagtccggc accatcaaca gctacgagat gcgaaatgca gtcaacgacg 2400 caggattccacctcaacaac cagctctatg acatcattac catgcggtac gcagacaaac 2460 acatgaacatcgactttgac agtttcatct gctgcttcgt taggctggag ggcatgttca 2520 gagcttttcatgcatttgac aaggatggag atggtatcat caagctcaac gttctggagt 2580 ggctgcagctcaccatgtat gcctgaacca ggctggcctc atccaaagcc atgcaggatc 2640 actcaggatttcagtttcac cctctatttc caaagccatt tacctcaaag gacccagcag 2700 ctacacccctacaggcttcc aggcacctca tcagtcatgt tcctcctcca ttttaccccc 2760 tacccatccttgatcggtca tgcctagcc 2789 23 2267 DNA Homo sapiens misc_feature IncyteID No 71269782CB1 23 gtaagtgaca caacttgaaa ctgcttggcc ctctttaaaaagaaataata aaatgggaga 60 gaatgaagca agtttaccta acacgtcttt gcaaggtaaaaagatggcct atcagaaggt 120 ccatgcagat caaagagctc caggacactc acagtacttagacaatgatg accttcaagc 180 cactgccctt gacttagagt gggacatgga gaaggaactagaggagtctg gttttgacca 240 attccagcta gacggtgctg agaatcagaa cctagggcattcagagacta tagacctcaa 300 tcttgattcc attcaaccag caacttcacc caaaggaaggttccagagac ttcaagaaga 360 atctgactac attacccatt atacacgatc tgcaccaaagagcaatcgct gcaacttttg 420 ccacgtctta aaaatacttt gcacagccac cattttatttatttttggga ttttgatagg 480 ttattatgta catacaaatt gcccttcaga tgctccatcttcaggaacag ttgatcctca 540 gttatatcaa gagattctca agacaatcca ggcagaagatattaagaagt ctttcagaaa 600 tttggtacaa ctatataaaa atgaagatga catggaaatttcaaagaaga ttaagactca 660 gtggacctct ttgggcctag aagatgtaca gtttgtaaattactctgtgc tgcttgatct 720 gccaggccct tctcccagca ctgtgactct gagcagcagtggtcaatgct ttcatcctaa 780 tggccagcct tgcagtgaag aagccagaaa agatagcagccaagacctgc tctattcata 840 tgcagcctat tctgccaaag gaactctcaa ggctgaagtcatcgatgtga gttatggaat 900 ggcagatgat ttaaaaagga ttaggaaaat aaaaaacgtaacaaatcaga tcgcactcct 960 gaaattagga aaattgccac tgctttataa gctttcctcattggaaaagg ctggatttgg 1020 aggtgttctt ctgtatatcg atccttgtga tttgccaaagactgtgaatc ctagccatga 1080 taccttcatg gtgtcactga atccaggagg agacccttctacgcctggtt acccaagtgt 1140 cgatgaaagt tttagacaaa gccgatcaaa cctcacctctctattagtgc agcccatctc 1200 tgcatccctc gttgcaaaac tgatctcttc gccaaaagctagaaccaaaa atgaagcgtg 1260 tagctctcta gagcttccaa ataatgaaat aagagtcgtcagcatgcaag ttcagacagt 1320 cacaaaattg aaaacagtta ctaatgttgt tggatttgtaatgggcttga catctccaga 1380 ccggtatatc atagttggca gccatcatca cactgcacacagttataatg gacaagaatg 1440 ggccagtagt actgcaataa tcacagcgtt tatccgtgccttgatgtcaa aagttaagag 1500 agggtggaga ccagaccgaa ctattgtttt ctgttcttggggaggaacag cttttggcaa 1560 tattggctca tatgaatggg gagaggattt caagaaggttcttcaaaaaa atgttgtggc 1620 ttatattagc ctccacagtc ccataagggg gaactctagtctgtatcctg tagcatcacc 1680 atctcttcag caactggtag tagaggtaag acaaaccactattgtatcaa atgattatgc 1740 aaaaccgacc ttttctctat attttgacat ttcttgattttttcatttat ttttaaatat 1800 gcatcaaatg ttgtataagt gttttaagaa atgatctattgctgacattt tatcaatata 1860 ccttaactaa tttcttgtgt tctggaattc ttcacttgctactcttttat ggtcatattt 1920 ctagaagaca tgagtcacac agttatagag aaggtatacaaaaatatatt tttaaaaaat 1980 atatgaattt agctctcaaa ttcccaattc tgtaatcttgacattttatg ataagcctgg 2040 ttacttttga atttcttcct cttcattctt gttttaagtaaatgtgagac ctgtcctatc 2100 tttacaactg ctgtgtaggc cccccgagag caagaatatagtgataacta aatttaaaag 2160 atttagaaaa tattgtttga aaaattacct gtggaaaaagaaaacatgtt ttcttagtat 2220 cctgaaaaat catatatttt ttatgtttca ttggagttacttatttt 2267 24 963 DNA Homo sapiens misc_feature Incyte ID No7472651CB1 24 atgggggacc cagaaggaag cgcagagtgg ggttggggga aggggataccggtggtcaga 60 agaaatttat taacagtgga tgggataagt ctgtgtctgg agggatcctggtggaggcag 120 aagggtcctg cctcacctgg attctctcac tccctcccca gactgcagccgaaccctggt 180 ccctcctcca caatgtggct tctcctcact ctctccttcc tgctggcatccacagcagcc 240 caggatggtg acaagttgct ggaaggtgac gagtgtgcac cccactcccagccatggcaa 300 gtggctctct acgagcgtgg acgctttaac tgtggcgctt ccctcatctccccacactgg 360 gtgctgtctg cggcccactg ccaaagccgc ttcatgagag tgcgcctgggagagcacaac 420 ctgcgcaagc gcgatggccc agagcaacta cggaccacgt ctcgggtcattccacacccg 480 cgctacgaag cgcgcagcca ccgcaacgac atcatgttgc tgcgcctagtccagcccgca 540 cgcctgaacc cccaggtgcg ccccgcggtg ctacccacgc gttgcccccacccgggggag 600 gcctgtgtgg tgtctggctg gggcctggtg tcccacaacg agcctgggaccgctgggagc 660 ccccggtcac aagtgagtct cccagatacg ttgcattgtg ccaacatcagcattatctcg 720 gacacatctt gtgacaagag ctacccaggg cgcctgacaa acaccatggtgtgtgcaggc 780 gcggagggca gaggcgcaga atcctgtgag ggtgactctg ggggacccctggtctgtggg 840 ggcatcctgc agggcattgt gtcctggggt gacgtccctt gtgacaacaccaccaagcct 900 ggtgtctata ccaaagtctg ccactacttg gagtggatca gggaaaccatgaagaggaac 960 tga 963 25 1137 DNA Homo sapiens misc_feature Incyte IDNo 7478251CB1 25 atggctgaga aaccatccaa cggtgttctg gtccacatgg tgaagttgctgatcaagacc 60 tttctagatg gcatttttga tgatttgatg gaaaataatg tattaaatacagatgagata 120 caccttatag gaaaatgtct aaagtttgtg gtgagcaatg ctgaaaacctggttgatgat 180 atcactgaga cagctcaaac tgcaggcaaa atatttaggg aacacctgtggaattccaaa 240 aaacagctga gttcaatttt tttctctctt tcagcttttc tggaaatccagggtgcccaa 300 cccagtggca agttaaagct ttgtcctcat gctcacttcc atgaactaaagacaaaaagg 360 gcagatgaga tatatccagt gatggagaaa aaaaggcgaa catgcctgggcctcaacatc 420 cgcaacaaag aattcaacta tcttcataat cgaaatggtt ctgaacttgaccttttgggg 480 atgcgagatc tacttgaaaa ccttggatac tcagtggtta taaaagagaatctcacagct 540 caggaaatgg aaacagcact aaggcagttt gctgctcacc cagagcaccagtcctcagac 600 agcacattcc tggtgtttat gtcacatagc atcctgaatg gaatctgtgggaccaagcac 660 tgggatcaag agccagatgt tcttcacgat gacaccatct ttgaaattttcaacaaccgt 720 aactgccaga gtctgaaaga caaacccaag gtcatcatca tgcaagcctgccgaggcaat 780 ggtgctggga ttgtttggtt caccactgac agtggaaaag ccggtgcagatactcatggt 840 cggctcttgc aaggtaacat ctgtaatgat gctgttacaa aggctcatgtggaaaaggac 900 ttcattgctt tcaaatcttc cacaccacat aatgtttctt ggagacatgaaacaaatggc 960 tctgtcttca tttcccaaat tatctactac ttcagagagt attcttggagtcatcatcta 1020 gaggaaattt ttcaaaaggt acaacattca tttgagaccc caaatatactgacccagctg 1080 cccaccattg aaagactatc catgacacga tatttctatc tctttcctgggaattaa 1137 26 3204 DNA Homo sapiens misc_feature Incyte ID No2759385CB1 26 gccagcgcgc caccatgggc agtcccggtt tccccttgta aagatggcggtgagggatcg 60 ctgcaacctt tagactaatg actgtccgaa acatcgcctc catctgtaatatgggcacca 120 atgcctctgc tctggaaaaa gacattggtc cagagcagtt tccaatcaatgaacactatt 180 tcggattggt caattttgga aacacatgct actgtaactc cgtgcttcaggcattgtact 240 tctgccgtcc attccgggag aatgtgttgg catacaaggc ccagcaaaagaagaaggaaa 300 acttgctgac gtgcctggcg gaccttttcc acagcattgc cacacagaagaagaaggttg 360 gcgtcatccc accaaagaag ttcatttcaa ggctgagaaa agagaatgatctctttgata 420 actacatgca gcaggatgct catgaatttt taaattattt gctaaacactattgcggaca 480 tccttcagga ggagaagaaa caggaaaaac aaaatggaaa attaaaaaatggcaacatga 540 acgaacctgc ggaaaataat aaaccagaac tcacctgggt ccatgagatttttcagggaa 600 cgcttaccaa tgaaactcga tgcttgaact gtgaaactgt tagtagcaaagatgaagatt 660 ttcttgacct ttctgttgat gtggagcaga atacatccat tacccactgtctaagagact 720 tcagcaacac agaaacactg tgtagtgaac aaaaatatta ttgtgaaacatgctgcagca 780 aacaagaagc ccagaaaagg atgagggtaa aaaagctgcc catgatcttggccctgcacc 840 taaagcggtt caagtacatg gagcagctgc acagatacac caagctgtcttaccgtgtgg 900 tcttccctct ggaactccgg ctcttcaaca cctccagtga tgcagtgaacctggaccgca 960 tgtatgactt ggttgcggtg gtcgttcact gtggcagtgg tcctaatcgtgggcattata 1020 tcactattgt gaaaagtcac ggcttctggc ttttgtttga tgatgacattgtagagaaaa 1080 tagatgctca agctattgaa gaattctatg gcctgacgtc agatatatcaaaaaattcag 1140 aatctggata tattttattc tatcagtcaa gagagtaact gaaagacctgcgggactgat 1200 tcacgtgggg agaatgttca cagcactgtc acccggcttc tccgcaggctttcctcttcc 1260 ccagtggccc actaatggta tcactccgag tctcaatggt ctggctgtgttagactctct 1320 ccttttgtgt ttttacatgc agcactactc ttggttttat ttcagtctgacatagagtta 1380 actgcaatca gattgtagtc tgatttatat gaataacggt tgctaattttaggactgggt 1440 gaaagctatg ccattcatta tgtctggctg tattagaatg acatttcctatgaatgtcta 1500 cggtctgttt taggtgtttg ctaaacttct atggcttcca gggtcttcttacaatgcatt 1560 cctttaactt gtccctggaa gcattgctac ccattttcag cttctctgcctctcttctga 1620 tacaaggaca gaagaattgg gtagatattc accttttagg ggtgcaagtatagctttaag 1680 tttgtgcaag tgaaaatgtt gaaaagtgag taacctcgat attaaaatcatccttgacat 1740 gaaacagggt gaagagaagc tgtccgtggc ggctggtgtt ggctggcattggcactgggc 1800 tgtgctgacc tagccattac aattccaggg gctaagaagg ctcagggcagacaaagtcaa 1860 gaggaggaag tttttgtgga caatgaaaag ttattttcgt acctttctaccaaaaccaag 1920 tttcaggaaa ataactctat gttgtttatt ttcagtgaca cttatgtaaggccctgtgag 1980 ttgtatttat cctgtatccg gcactgctaa gcttttcaag gtatctttccaatcctgctg 2040 atgtggcagt caatggctgc agggctggca aacctcccct tagccagtcagcacggcatt 2100 gttccttatc aggaataaca aaggtactac atctttccag ccatcagcacgttgtacaac 2160 ttaacttttt aacatagtcc gtctgtttac tgaggcactg gcgagtcccagggctgatac 2220 agaaccttcc ctagagggaa taccagagtt agctgggtat agaggtggctcaaaggaagt 2280 gtccgtgggc agggggagga atgaacaaaa tggcgctgtt tctttggctcagactcctag 2340 aatgcttgac aagacagaat ttttttggaa gaacctcatc tcactatagttacttttttc 2400 acttttgtta tatatgtatt tattagagca tttgaatatt ggtacctttaaaagggtcat 2460 ttggtgtttt gctgttgagc tggtttttga gtcatagatc ttggcttcctttagaagcca 2520 cttaacttcc atacactata ataaactgtg aacatatttt tgttacctaatgcatccact 2580 gatgaacatg caaactttgg gcataatgtg aactaaaatt gaaatggaaaatgttagtgg 2640 ccattttgca acaatgaaga ggatagcact ttatctagat gaaaactggatttcttatct 2700 ttgaaatatc ttgaactgtt tattgctcag aacttaagta agcatgccaacatttcgttt 2760 gtttatgctt gaagtgaaat gttttacttt tcactggaga agacaaaacagggtgatctt 2820 catgttattg ttttatacaa gtgatggaaa tgtaccttgc cttgtttagaggcaatttca 2880 catttataaa tatttttttt tcctccatga aacttacgca gtaatcactacctggaaggt 2940 gagttttgat ctctttttaa ggagaggcac tttccaactg aaggtgattgatgtagggaa 3000 atgtttgtac tatatagaat ccatatattt gactgcaagt tacaaagttttaagaacatg 3060 atggttggtc tctaatatat ttggaactga ttcataagaa aagttattaaaattatcttt 3120 gaaacacctc ttgaagctaa tttattagaa aaaatatttc agttggaaggctgtagaagt 3180 aatgtttaaa tgctaagtca taag 3204 27 1641 DNA Homo sapiensmisc_feature Incyte ID No 4226182CB1 27 attttaatct atacattgaa acgatttgtcactgtcactc aacaaagtat tttttatcag 60 aatattggag caaagccttt ggcaaacatagccagatgtg gtgagaacac taaaggcatt 120 aaaaactttg atctattaga tatgtttcagatatcaagag tgtttaatct aattaatact 180 aatatgtcat attaaataat attccaagtttgaaacaatt gaggacatat ggaaagatca 240 tacctcaatt tgcttcagat ttggattttatgaactgcag acttaaatta ttagcaggaa 300 ttctcatttt taaattgtct gttaaaatcaattataaatg taaatttatt tatttagtta 360 tatggattat cctcgttatt tgggagcagtgtttcctgga acaatgtgta ttactcgtta 420 ttctgcagga gttgcattgc aatgtggacctgcaagctgt tgtgattttc gaacttgtgt 480 actgaaagac ggagcaaaat gttataaaggactgtgctgc aaagactgtc aaattttaca 540 atcaggcgtt gaatgtaggc cgaaagcacatcctgaatgt gacatcgctg aaaattgtaa 600 tggaagctca ccagaatgtg gtcctgacataactttaatc aatggacttt catgcaaaaa 660 taataagttt atttgttatg acggagactgccatgatctc gatgcacgtt gtgagagtgt 720 atttggaaaa ggttcaagaa atgctccatttgcctgctat gaagaaatac aatctcaatc 780 agacagattt gggaactgtg gtagggatagaaataacaaa tatgtgttct gtggatggag 840 gaatcttata tgtggaagat tagtttgtacctaccctact cgaaagcctt tccatcaaga 900 aaatggtgat gtgatttatg ctttcgtacgagattctgta tgcataactg tagactacaa 960 attgcctcga acagttccag atccactggctgtcaaaaat ggctctcagt gtgatattgg 1020 gagggtttgt gtaaatcgtg aatgtgtagaatcaaggata attaaggctt cagcacatgt 1080 ttgttcacaa cagtgttctg gacatggagtgtgtgattcc agaaacaagt gccattgttc 1140 gccaggctat aagcctccaa actgccaaatacgttccaaa ggattttcca tatttcctga 1200 ggaagatatg ggttcaatca tggaaagagcatctgggaag actgaaaaca cctggcttct 1260 aggtttcctc attgctcttc ctattctcattgtaacaacc gcaatagttt tggcaaggaa 1320 acagttgaaa aagtggttcg ccaaggaagaggaattccca agtagcgaat ctaaatcgga 1380 aggtagcaca cagacatatg ccagccaatccagctcagaa ggcagcactc agacatatgc 1440 cagccaaacc agatcagaaa gcagcagtcaagctgatact agcaaatcca aatcacagga 1500 cagtacccaa acacaaagca gtagtaactagtgattcctt cagaaggcaa cggataacat 1560 cgagagtctc gctaagaaat gaaaattctgtctttccttc cgtggtcaca gctgaaagaa 1620 acaataaatt gagtgtggat c 1641 281983 DNA Homo sapiens misc_feature Incyte ID No 5078962CB1 28 cctgattggtgctaccaagc ccaaaataga cccatggtaa aacaccaaat tcggccatgc 60 agcaacccctgtaatgtttt actgagataa aaaagatgcg agttgaccta cttttcaagc 120 ttttcaaggttcaactgatg aaccttttga gcatttattc acatgcgctg ggtagcccag 180 ggcaccaatcattgagaaag agtaaggaat tgccgaagaa cataattttg aaatcctcag 240 gccaaaagggagttatgtca tttaatgact cacaaatgat ttagaggatc gtagggttta 300 acatttctatttcctaatgg tccataacac catcatatgc ccaaatgatt gtccacaagg 360 cacagttgaggattctaaca ctaatcataa ttaattcaaa tgttgtacca taactttatc 420 atagtaaatttatacagtct cacatggaag tactgttgct atagcatagt tgataaatac 480 aagaaatgtcttcaattgtt gctgcacaat ttctttattt aacattttag gttgacatga 540 caactgaagagatagatgct cttgttcatc gggaaatcat cagtcataat gcctatccct 600 cacctctaggctatggaggt tttccaaaat ctgtttgtac ctctgtaaac aacgtgctct 660 gtcatggtattcctgacagt cgacctcttc aggatggaga tattatcaac attgatgtca 720 cagtctattacaatggctac catggagaca cctctgaaac atttttggtg ggcaatgtgg 780 acgaatgtggtaaaaagtta gtggaggttg ccaggaggtg tagagatgaa gcaattgcag 840 cttgcagagcaggggctccc ttctctgtaa ttggaaacac aatcagccac ataactcatc 900 agaatggttttcaagtctgt ccacattttg tgggacatgg aataggatct tactttcatg 960 gacatccagaaatttggcat catgcaaacg acagtgatct acccatggag gagggcatgg 1020 cattcactatagagccaatc atcacggagg gatcccctga atttaaagtc ctggaggatg 1080 catggactgtggtctcccta gacaatcaaa ggtcggcgca gttcgagcac acggttctga 1140 tcacgtcgaggggcgcgcag atcctgacca aactacccca tgaggcctga ggagccgccc 1200 gaaggtcgcggtgacctggt gcctttttaa ataaattgct gaaatttggc tggagaactt 1260 ttagaagaaacagggaaatg accggtggtg cggtaacctg cgtggctcct gatagcgttt 1320 ggaagaacgcgggggagact gaagagcaac tgggaactcg gatctgaagc cctgctgggg 1380 tcgcgcggctttggaaaaac aaatcctggc cctggactcg gtttcccagc gcggtcaacg 1440 catctggaggggactggagg aaaccccctt gttggaagag attccaagag aagcacggtt 1500 ttctctttcccttgccctga ctgttggagt aaaaaacctc ttaaatccat tgtatcagag 1560 gtccttacctctctgacagt tacaatgatc tttgtatctg aactttgcac gtctgccgaa 1620 aaatccgaacctgttgactg ggatttttaa gaatccgttt ctcccttttg tgtattccat 1680 attggccggccccaaggatg ctcgcagaag ccagccccca accccagccc ttccgtatct 1740 ttcccctccatcgcggcttt gcgatgaaag attagcccgc gaacagaggc attgattaca 1800 aacatgtccttggccagtgg actctgggcc tggccattct tcaggtttct gtcaatccag 1860 aaacgcgactttcctggacc cctgcggctc ttccttccca ccagctcagc atcacagccc 1920 atccagaggccaagtccaag aaggaataac agtaatgagg gaaccttccg agcaaaaacg 1980 caa 1983 291574 DNA Homo sapiens misc_feature Incyte ID No 7474340CB1 29 gccagggccaagatggatct tctcctcgac atcagctaag cctggaggac tcttcccctc 60 agagaccatggagagggaca gccacgggaa tgcatctcca gcaagaacac cttcagctgg 120 agcatctccagcccaggcat ctccagctgg gacacctcca ggccgggcat ctccagccca 180 ggcatctccagcccaggcat ctccagctgg gacacctccg ggccgggcat ctccagccca 240 ggcatctccagctggtacac ctccaggccg ggcatctcca ggccgggcat ctccagccca 300 ggcatctccagcccgggcat ctccggctct ggcatcactt tccaggtcct catccggcag 360 gtcatcatccgccaggtcag cctcggtgac aacctcccca accagagtgt accttgttag 420 agcaacaccagtgggggctg tacccatccg atcatctcct gccaggtcag caccagcaac 480 cagggccaccagggagagcc caggtacgag cctgcccaag ttcacctggc gggagggcca 540 gaagcagctaccgctcatcg ggtgcgtgct cctcctcatt gccctggtgg tttcgctcat 600 catcctcttccagttctggc agggccacac agggatcagg tacaaggagc agagggagag 660 ctgtcccaagcacgctgttc gctgtgacgg ggtggtggac tgcaagctga agagtgacga 720 gctgggctgcgtgaggtttg actgggacaa gtctctgctt aaaatctact ctgggtcctc 780 ccatcagtggcttcccatct gtagcagcaa ctggaatgac tcctactcag agaagacctg 840 ccagcagctgggtttcgaga gtgctcaccg gacaaccgag gttgcccaca gggattttgc 900 caacagcttctcaatcttga gatacaactc caccatccag gaaagcctcc acaggtctga 960 atgcccttcccagcggtata tctccctcca gtgttcccac tgcggactga gggccatgac 1020 cgggcggatcgtgggagggg cgctggcctc ggatagcaag tggccttggc aagtgagtct 1080 gcacttcggcaccacccaca tctgtggagg cacgctcatt gacgcccagt gggtgctcac 1140 tgccgcccactgcttcttcg tgacccggga gaaggtcctg gagggctgga aggtgtacgc 1200 gggcaccagcaacctgcacc agttgcctga ggcagcctcc attgccgaga tcatcatcaa 1260 cagcaattacaccgatgagg aggacgacta tgacatcgcc ctcatgcggc tgtccaagcc 1320 cctgaccctgtccggtgagg gaatctgcac tccccgctct cctgcccccc agccccagca 1380 ccctctgcagccctcgcact tgtcagcatc tgtcaactca tatccgggcc ccaaagcttc 1440 tgcagggcagaagtcaaaga ctcttaaaga tccttacatg gaacacttct gttttataat 1500 tagggaaactgaagcccaag ggttataaat aagtttgctc caaatgacac atctcacatt 1560 acaaattgatgacg 1574 30 1173 DNA Homo sapiens misc_feature Incyte ID No 7477287CB130 atggggccaa gactcattcc gtttctattt ttgtttgttt accctattct ctgcaggatc 60attctgagga aaggcaagtc tatccgccag agaatggagg agcagggtgt actggagacg 120tttctgaggg accacccaaa ggctgatcca attgccaagt attatttcaa taatgatgct 180gttgcttatg agcccttcac caactacctg gattctttct actttgggga gatcagcact 240gggacaccac cccaaaattt cctagtctct ttgatacggg ttcctccaat ctgtagcctg 300ccctccatct actgccagag ccaagtctgc tccaatcaca acaggttcaa tcccagcctg 360tcctccacct tcagaaacga tggacaaacc tatggactat cctatgggag tggcagcctg 420agtgtgttcc tgggctatga cactgtgact gttcataaca tcgttgtcaa taaccaggag 480tttggcctga gtgagaatga gcccagcgac cccttttact attcagactt tgacgggatc 540ctgggaatgg cctacccaaa catggcagag gggaattccc ctacagtaat gcaggggatg 600ctgcagcaga gccagcttac tcagcccgtc ttcagcttct acttcacctg ccagccaacc 660cgccagtatt gtggagagct catccttgga ggtgtggacc ccaaccttta ttctggtcag 720atcatctgga cccctgtcag cccggaactg tactggcaga ttgccatcga ggaatttgcc 780atcggtaacc aggccactgg cttgtgctct gagggttgcc aggccattgt ggataccgag 840accttcctgc tggcagttcc tcagcagtac atggcctcct tcctgcaggc aacaggaccc 900cagcaggctc agaatggtga ctttgtggtc aactgcagct acatacagag catgcccacc 960atcaccttca tcatcggcgg ggcccagttt cctctgcctc cctctgaata tgtcttcaat 1020aacaatggct actgcaggct tggaactgag gccacctgcc tgccctcccg cagtgggcag 1080cccctctgga ttctggggga tgtcttcctc aaggaatatt gctctgtcta tgacatggcc 1140aacaacaggg tgggctttgc cttctctgcc tag 1173 31 6013 DNA Homo sapiensmisc_feature Incyte ID No 2994162CB1 31 gcgggacctg gccgagatgg ggagcccagacgccgcggcg gccgtgcgca aggacaggct 60 gcacccgagg caagtgaaat tattagagaccctgagcgaa tacgaaatcg tgtctcccat 120 ccgagtgaac gctctcggag aaccctttcccacgaacgtc cacttcaaaa gaacgcgacg 180 gagcattaac tctgccactg acccctggcctgccttcgcc tcctcctctt cctcctctac 240 ctcctcccag gcgcattacc gcctctctgccttcggccag cagtttctat ttaatctcac 300 cgccaatgcc ggatttatcg ctccactgttcactgtcacc ctcctcggga cgcccggggt 360 gaatcagacc aagttttatt ccgaagaggaagcggaactc aagcactgtt tctacaaagg 420 ctatgtcaat accaactccg agcacacggccgtcatcagc ctctgctcag gaatgctggg 480 cacattccgg tctcatgatg gggattattttattgaacca ctacagtcta tggatgaaca 540 agaagatgaa gaggaacaaa acaaaccccacatcatttat aggcgcagcg ccccccagag 600 agagccctca acaggaaggc atgcatgtgacacctcagaa cacaaaaata ggcacagtaa 660 agacaagaag aaaaccagag caagaaaatggggagaaagg attaacctgg ctggtgacgt 720 agcagcatta aacagcggct tagcaacagaggcattttct gcttatggta ataagacgga 780 caacacaaga gaaaagagga cccacagaaggacaaaacgt tttttatcct atccacggtt 840 tgtagaagtc ttggtggtgg cagacaacagaatggtttca taccatggag aaaaccttca 900 acactatatt ttaactttaa tgtcaattgtagcctctatc tataaagacc caagtattgg 960 aaatttaatt aatattgtta ttgtgaacttaattgtgatt cataatgaac aggatgggcc 1020 ttccatatct tttaatgctc agacaacattaaaaaacttt tgccagtggc agcattcgaa 1080 gaacagtcca ggtggaatcc atcatgatactgctgttctc ttaacaagac aggatatctg 1140 cagagctcac gacaaatgtg ataccttaggcctggctgaa ctgggaacca tttgtgatcc 1200 ctatagaagc tgttctatta gtgaagatagtggattgagt acagctttta cgatcgccca 1260 tgagctgggc catgtgttta acatgcctcatgatgacaac aacaaatgta aagaagaagg 1320 agttaagagt ccccagcatg tcatggctccaacactgaac ttctacacca acccctggat 1380 gtggtcaaag tgtagtcgaa aatatatcactgagttttta gacactggtt atggcgagtg 1440 tttgcttaac gaacctgaat ccagaccctaccctttgcct gtccaactgc caggcatcct 1500 ttacaacgtg aataaacaat gtgaattgatttttggacca ggttctcagg tgtgcccata 1560 tatgatgcag tgcagacggc tctggtgcaataacgtcaat ggagtacaca aaggctgccg 1620 gactcagcac acaccctggg ccgatgggacggagtgcgag cctggaaagc actgcaagta 1680 tggattttgt gttcccaaag aaatggatgtccccgtgaca gatggatcct ggggaagttg 1740 gagtcccttt ggaacctgct ccagaacatgtggagggggc atcaaaacag ccattcgaga 1800 gtgcaacaga ccagaaccaa aaaatggtggaaaatactgt gtaggacgta gaatgaaatt 1860 taagtcctgc aacacggagc catgtctcaagcagaagcga gacttccgag atgaacagtg 1920 tgctcacttt gacgggaagc attttaacatcaacggtctg cttcccaatg tgcgctgggt 1980 ccctaaatac agtggaattc tgatgaaggaccggtgcaag ttgttctgca gagtggcagg 2040 gaacacagcc tactatcagc ttcgagacagagtgatagat ggaactcctt gtggccagga 2100 cacaaatgat atctgtgtcc agggcctttgccggcaagct ggatgcgatc atgttttaaa 2160 ctcaaaagcc cggagagata aatgtggggtttgtggtggc gataattctt catgcaaaac 2220 agtggcagga acatttaata cagtacattatggttacaat actgtggtcc gaattccagc 2280 tggtgctacc aatattgatg tgcggcagcacagtttctca ggggaaacag acgatgacaa 2340 ctacttagct ttatcaagca gtaaaggtgaattcttgcta aatggaaact ttgttgtcac 2400 aatggccaaa agggaaattc gcattgggaatgctgtggta gagtacagtg ggtccgagac 2460 tgccgtagaa agaattaact caacagatcgcattgagcaa gaacttttgc ttcaggtttt 2520 gtcggtggga aagttgtaca accccgatgtacgctattct ttcaatattc caattgaaga 2580 taaacctcag cagttttact ggaacagtcatgggccatgg caagcatgca gtaaaccctg 2640 ccaaggggaa cggaaacgaa aacttgtttgcaccagggaa tctgatcagc ttactgtttc 2700 tgatcaaaga tgcgatcggc tgccccagcctggacacatt actgaaccct gtggtacaga 2760 ctgtgacctg aggtggcatg ttgccagcaggagtgaatgt agtgcccagt gtggcttggg 2820 ttaccgcaca ttggacatct actgtgccaaatatagcagg ctggatggga agactgagaa 2880 ggttgatgat ggtttttgca gcagccatcccaaaccaagc aaccgtgaaa aatgctcagg 2940 ggaatgtaac acgggtggct ggcgctattctgcctggact gaatgttcaa aaagctgtga 3000 cggtgggacc cagaggagaa gggctatttgtgtcaatacc cgaaatgatg tactggatga 3060 cagcaaatgc acacatcaag agaaagttaccattcagagg tgcagtgagt tcccttgtcc 3120 acagtggaaa tctggagact ggtcagagtgcttggtcacc tgtggaaaag ggcataagca 3180 ccgccaggtc tggtgtcagt ttggtgaagatcgattaaat gatagaatgt gtgaccctga 3240 gaccaagcca acatctatgc agacttgtcagcagccggaa tgtgcatcct ggcaggcggg 3300 tccctgggga cagtgcagtg tcacttgtggacagggatac cagctaagag cagtgaaatg 3360 catcattggg acttatatgt cagtggtagatgacaatgac tgtaatgcag caactagacc 3420 aactgatacc caggactgtg aattaccatcatgtcatcct cccccagctg ccccggaaac 3480 gaggagaagc acatacagtg caccaagaacccagtggcga tttgggtctt ggaccccatg 3540 ctcagccact tgtgggaaag gtacccggatgagatacgtc agctgccgag atgagaatgg 3600 ctctgtggct gacgagagtg cctgtgctaccctgcctaga ccagtggcaa aggaagaatg 3660 ttctgtgaca ccctgtgggc aatggaaggccttggactgg agctcttgct ctgtgacctg 3720 tgggcaaggt agggcaaccc ggcaagtgatgtgtgtcaac tacagtgacc acgtgatcga 3780 tcggagtgag tgtgaccagg attatatcccaaaaactgac caggactgtt ccatgtcacc 3840 atgccctcaa aggaccccag acagtggcttagctcagcac cccttccaaa atgaggacta 3900 tcgtccccgg agcgccagcc ccagccgcacccatgtgctc ggtggaaacc agtggagaac 3960 tggcccctgg ggagcatgtt ccagtacctgtgctggcgga tcccagcggc gtgttgttgt 4020 atgtcaggat gaaaatggat acaccgcaaacgactgtgtg gagagaataa aacctgatga 4080 gcaaagagcc tgtgaatccg gcccttgtcctcagtgggct tatggcaact ggggagagtg 4140 cactaagctg tgtggtggag gcataagaacaagactggtg gtctgtcagc ggtccaacgg 4200 tgaacggttt ccagatttga gctgtgaaattcttgataaa cctcccgatc gtgagcagtg 4260 taacacacat gcttgtccac acgacgctgcatggagtact ggcccttgga gctcgtgttc 4320 tgtctcttgt ggtcgagggc ataaacaacgaaatgtttac tgcatggcaa aagatggaag 4380 ccatttagaa agtgattact gtaagcacctggctaagcca catgggcaca gaaagtgccg 4440 aggaggaaga tgccccaaat ggaaagctggcgcttggagt cagtgctctg tgtcctgtgg 4500 ccgaggcgta cagcagaggc atgtgggctgtcagatcgga acacacaaaa tagccagaga 4560 gaccgagtgc aacccataca ccagaccggagtcggaacgc gactgccaag gcccacggtg 4620 tcccctctac acttggaggg cagaggaatggcaagaatgc accaagacct gcggcgaagg 4680 ctccaggtac cgcaaggtgg tgtgtgtggatgacaacaaa aacgaggtgc atggggcacg 4740 ctgtgacgtg agcaagcggc cggtggaccgtgaaagctgt agtttgcaac cctgcgagta 4800 tgtctggatc acaggagaat ggtcagagtgctcagtgacc tgtggaaaag gctacaaaca 4860 aaggcttgtc tcgtgcagcg agatttacaccgggaaggag aattatgaat acagctacca 4920 aaccaccatc aactgcccag gcacgcagccccccagtgtt cacccctgtt acctgaggga 4980 ctgccctgtc tcggccacct ggagagttggcaactggggg agctgctcag tgtcttgtgg 5040 tgttggagtg atgcagagat ctgtgcaatgtttaaccaat gaggaccaac ccagccactt 5100 atgccacact gatctgaagc cagaagaacgaaaaacctgc cgtaatgtct ataactgtga 5160 gttaccccag aattgcaagg aggtaaaaagacttaaaggt gccagtgaag atggtgaata 5220 tttcctgatg attagaggaa agcttctgaagatattctgt gcggggatgc actctgacca 5280 ccccaaagag tacgtgacac tggtgcatggagactctgag aatttctccg aggtttatgg 5340 gcacaggtta cacaacccaa cagaatgtccctataacggg agccggcgcg atgactgcca 5400 atgtcggaag gattacacgg ccgctgggttttccagtttt cagaaaatca gaatagacct 5460 gaccagcatg cagataatca ccactgacttacagtttgca aggacaagcg aaggacatcc 5520 cgtccctttt gccacagccg gggattgctacagcgctgcc aagtgcccac agggtcgttt 5580 tagcatcaac ctttatggaa ccggcttgtctttaactgaa tctgccagat ggatatcaca 5640 agggaattat gctgtctctg acatcaagaagtcgccggat ggtacccgag tcgtagggaa 5700 atgcggtggt tactgtggaa aatgcactccatcctctggt actggcctgg aggtgcgagt 5760 tttatagcta aggtgctttg aagaggaagccattatggat ggatgaagga tagtaatgca 5820 atacctccac cttaatttgg gtgcatgtgtatgtgtgtgt gtgtttgtgt gtgacttgta 5880 tgcttgtgtg tgtaaatgtg tgtacatatacatatataca tatctacaca tacatatata 5940 cacatatatg tgtgtatgta gatatgtagactatcctaat gatgtaaagt ttaatattta 6000 tgtttgaaat tat 6013 32 1393 DNAHomo sapiens misc_feature Incyte ID No 3965293CB1 32 gcggccagagagctcgtcat ttgaagactc tctcggaagg gatagcgtct ttctgcaacc 60 tgcggtcccagcagacaaac cttgtgatcc tcgttccagt cgacatggag gacgactcac 120 tctacttgggaggtgagtgg cagttcaacc acttttcaaa actcacatct tctcggcccg 180 atgcagcttttgctgaaatc cagcggactt ctctccctga gaagtcacca ctctcatgtg 240 agacccgtgtcgacctctgt gatgatttgg ctcctgtggc aagacagctt gctcccaggg 300 agaagcttcctctgagtagc aggagacctg ctgcggtggg ggctgggctc cagaatatgg 360 gaaatacctgctacgtgaac gcttccttgc agtgcctgac atacacaccg ccccttgcca 420 actacatgctgtcccgggag cactctcaaa cgtgtcatcg tcacaagggc tgcatgctct 480 gtactatgcaagctcacatc acacgggccc tccacaatcc tggccacgtc atccagccct 540 cacaggcattggctgctggc ttccatagag gcaagcagga agatgcccat gaatttctca 600 tgttcactgtggatgccatg aaaaaggcat gccttcccgg gcacaagcag gtagatcatc 660 actctaaggacaccaccctc atccaccaaa tatttggagg ctactggaga tctcaaatca 720 agtgtctccactgccacggc atttcagaca cttttgaccc ttacctggac atcgccctgg 780 atatccaggcagctcagagt gtccagcaag ctttggaaca gttggtgaag cccgaagaac 840 tcaatggagagaatgcctat cattgtggtg tttgtctcca gagggcgccg gcctccaaga 900 cgttaactttacacacctct gccaaggtcc tcatccttgt attgaagaga ttctccgatg 960 tcacaggcaacctcgagccg aattcggctc gagctcgagc cgaaagatcc caatgttcta 1020 cctctccctgtccctcttgt aggggatagg gaggcagaga gagccagccc ctaccctcag 1080 agtatctggacctcagagac catgttgtgc caggggtggt cccacctaaa gatgctagcc 1140 cctctccaggtgggcataag gagtaacaga tggcaaaacc acaaactatt ttgatggact 1200 gtgctgcagtatcaccagaa gacattaggg ggcagtaggc ccccacacaa aaccttcagg 1260 cttgaattttaaaggggagg actttctgcc aacttttctt gtatgccttg ggaaagccag 1320 ttgccctgaacccagcagac accatggaat gtcctttgca cgcattaaat ggtacagaac 1380 tgaaaaaaaaaaa 1393 33 1993 DNA Homo sapiens misc_feature Incyte ID No 4948403CB133 cccaaaggaa gcagcaccca gtgaacccct gccgtgagtc agcacgaggg aggcccagcc 60ctttctagag gagcctgatt aaagatcagg ctcagctgct gctgctgctg ctgctgcttg 120tcccaagacc aagtcgtaat agcaacttcc cttcctcagc tgcctgaact ttttttttcc 180cttgtagctg gagagaagtg tcacattttg ctcactctca accttcctcg cccaccccct 240tcccggagaa cctgtgcggt gtgtagaggg tgctgtgagc cacctccagc ctcgggtggc 300tgcttaagta actttcaact cctctcttct taacactatg aagtgtctcg ggaagcgcag 360gggccaggca gctgctttcc tgcctctttg ctggctcttt ttgaagattc tgcaaccggg 420gcacagccac ctttataaca accgctatgc tggtgataaa gtgataagat ttattcccaa 480aacagaagag gaagcatatg cactgaagaa aatatcctat caacttaagg tggacctgtg 540gcagcccagc agtatctcct atgtatcaga gggaacagtt actgatgtcc atatccccca 600aaatggttcc cgagccctgt tagccttctt acaggaagcc aacatccagt acaaggtcct 660catagaagat cttcagaaaa cactggagaa gggaagcagc ttgcacaccc agagaaaccg 720aagatccctc tctggatata attatgaagt ttatcactcc ttagaagaaa ttcaaaattg 780gatgcatcat ctgaataaaa ctcactcagg cctcattcac atgttctcta ttggaagatc 840atatgaggga agatctcttt ttattttaaa gctgggcaga cgatcacgac tcaaaagagc 900tgtttggata gactgtggta ttcatgcaag agaatggatt ggtcctgcct tttgtcagtg 960gtttgtaaaa gaagctcttc taacatataa gagtgaccca gccatgagaa aaatgttgaa 1020tcatctatat ttctatatca tgcctgtgtt taacgtcgat ggataccatt ttagttggac 1080caatgatcga ttttggagaa aaacaaggtc aaggaactca aggtttcgct gccgtggagt 1140ggatgccaat agaaactgga aagtgaagtg gtgtgatgaa ggagcttcta tgcacccttg 1200tgatgacaca tactgtggcc cttttccaga atctgagccg gaagtgaagg ctgtagctaa 1260cttccttcga aaacacagaa agcacattag ggcttatctc tcctttcatg catatgctca 1320gatgttactg tatccctatt cttacaaata tgcaacaatt cccaatttta gatgtgtgga 1380atctgcagct tataaagctg tgaatgcact tcagtcagta tacggggtac gatacagata 1440tggaccagcc tccacaacgt tgtatgtgag ctctggtagc tcaatggatt gggcctacaa 1500aaatggaata ccttatgcat ttgctttcga actacgtgac actggatatt ttggattttt 1560actcccagag atgctcatca aacccacctg tacagaaact atgctggctg tgaaaaatat 1620cacaatgcac ctgctaaaga aatgtccctg agacagccca aggctcaggt caactgccat 1680aggattctga gcaaggccta cttggccctg gatagaaatt gttttcaaag agaagggcag 1740ctgcttagag tgaacatgtc tatggacttt aaaaagaccc cacgcaattt gactttgtgg 1800caatagaaaa cagtaaaaaa cagggcatag cctagtttgt tataagaaaa agcatccatt 1860ttctatcctt ttagagtctt atttgattat ggtgggaggg aatgttttca aatttcccat 1920ttctcaagaa atgttcatat taattgagga tttcccttca ataaatctca tgtcctcagt 1980taggaaaaaa aaa 1993 34 2318 DNA Homo sapiens misc_feature Incyte ID No7473165CB1 34 cggcagccac tcctgagtga gcaaaggttc ctccgcggtg ctctcccgtccagagccctg 60 ctgatgggga agtccgaggg ccagtgggga tggtggagag cgccggccgtgcagggcaga 120 agcgcccggg gttcctggag ggggggctgc tgctgctgct gctgctggtgaccgctgccc 180 tggtggcctt gggtgtcctc tacgccgacc gcagagggat cccagaggcccaagaggtga 240 gcgaggtctg caccacccct ggctgcgtga tagcagccgc caggatcctccagaacatgg 300 acccgaccac ggaaccgtgt gacgacttct accagtttgc atgcggaggctggctgcggc 360 gccacgtgat ccctgagacc aactcaagat acagcatctt tgacgtcctccgcgacgagc 420 tggaggtcat cctcaaagcg gtgctggaga attcgactgc caaggaccggccggctgtgg 480 agaaggccag gacgctgtac cgctcctgca tgaaccagag tgtgatagagaagcgaggct 540 ctcagcccct gctggacatc ttggaggtgg tgggaggctg gccggtggcgatggacaggt 600 ggaacgagac cgtaggactc gagtgggagc tggagcggca gctggcgctgatgaactcac 660 agttcaacag gcgcgtcctc atcgacctct tcatctggaa cgacgaccagaactccagcc 720 ggcacatcat ctacatagac cagcccacct tgggcatgcc ctcccgagagtactacttca 780 acggcggcag caaccggaag gtgcgggaag cctacctgca gttcatggtgtcagtggcca 840 cgttgctgcg ggaggatgca aacctgccca gggacagctg cctggtgcaggaggacatgg 900 tgcaggtgct ggagctggag acacagctgg ccaaggccac ggtaccccaggaggagagac 960 acgacgtcat cgccttgtac caccggatgg gactggagga gctgcaaagccaatttggcc 1020 tgaagggatt taactggact ctgttcatac aaactgtgct atcctctgtcaaaatcaagc 1080 tgctgccaga tgaggaagtg gtggtctatg gcatccccta cctgcagaaccttgaaaaca 1140 tcatcgacac ctactcagcc aggaccatac agaactacct ggtctggcgcctggtgctgg 1200 accgcattgg tagcctaagc cagagattca aggacacacg agtgaactaccgcaaggcgc 1260 tgtttggcac aatggtggag gaggtgcgct ggcgtgaatg tgtgggctacgtcaacagca 1320 acatggagaa cgccgtgggc tccctctacg tcagggaggc gttccctggagacagcaaga 1380 gcatggtgga actcattgac aaggtgcgga cagtgtttgt ggagacgctggacgagctgg 1440 gctggatgga cgaggagtcc aagaagaagg cgcaggagaa ggccatgagcatccgggagc 1500 agatcgggca ccctgactac atcctggagg agatgaacag gcgcctggacgaggagtact 1560 ccaatgtgaa cttctcagag gacctgtact ttgagaacag tctgcagaacctcaaggtgg 1620 gcgcccagcg gagcctcagg aagcttcggg aaaaggtgga cccaaatctgatcatcgggg 1680 cggcggtggt caatgcgttc tactccccaa accgaaacca gattgtattccctgccggga 1740 tcctccagcc ccccttcttc agcaaggagc agccacaggc cttgaactttggaggcattg 1800 ggatggtgat cgggcacgag atcacgcacg gctttgacga caatggtcggaacttcgaca 1860 agaatggcaa catgatggat tggtggagta acttctccac ccagcacttccgggagcagt 1920 cagagtgcat gatctaccag tacggcaact actcctggga cctggcagacgaacagaacg 1980 tgaacggatt caacaccctt ggggaaaaca ttgctgacaa cggaggggtgcggcaagcct 2040 ataaggccta cctcaagtgg atggcagagg gtggcaagga ccagcagctgcccggcctgg 2100 atctcaccca tgagcagctc ttcttcatca actatgccca ggtgtggtgcgggtcctacc 2160 ggcccgagtt cgccatccaa tccatcaaga cagacgtcca cagtcccctgaagtacaggg 2220 tactggggtc gctgcagaac ctggccgcct tcgcagacac gttccactgtgcccggggca 2280 cccccatgca ccccaaggag cgatgccgcg tgtggtag 2318 35 1931DNA Homo sapiens misc_feature Incyte ID No 7476667CB1 35 cccttatatcatcgtcttcg ccatctacaa atgaaatgtt caccctaact accaatgggg 60 acctaccccgaccaatattc atccccaatg gaatgccaaa cactgttgtg ccatgtggaa 120 ctgagaagaacttcacaaat ggaatggtta atggtcacat gccatctctt cctgacagcc 180 cctttacaggttacatcatt gcagtccacc gaaaaatgat gaggacagaa ctgtatttcc 240 tgtcatctcagaagaatcgc cccagcctct ttggaatgcc attgattgtt ccatgtactg 300 tgcatacccggaagaaagac ctatatgatg cggtttggat tcaagtatcc cggttagcga 360 gcccactcccacctcaggaa gctagtaatc atgcccagga ttgtgacgac agtatgggct 420 atcaatatccattcactcta cgagttgtgc agaaagatgg gaactcctgt gcttggtgcc 480 catggtatagattttgcaga ggctgtaaaa ttgattgtgg ggaagacaga gctttcattg 540 gaaatgcctatatcgctgtg gattgggatc ccacagccct tcaccttcgc tatcaaacat 600 cccaggaaagggttgtagat gagcatgaga gtgtggagca gagtcggcga gcgcaagccg 660 agcccatcaacctggacagc tgtctccgtg ctttcaccag tgaggaagag ctaggggaaa 720 atgagatgtactactgttcc aagtgtaaga cccactgctt agcaacaaag aagctggatc 780 tctggaggcttccacccatc ctgattattc accttaagcg atttcaattt gtaaatggtc 840 ggtggataaaatcacagaaa attgtcaaat ttcctcggga aagttttgat ccaagtgctt 900 ttttggtaccaagagacccg gctctctgcc agcataaacc actcacaccc cagggggatg 960 agctctctgagcccaggatt ctggcaaggg aggtgaagaa agtggatgcg cagagttcgg 1020 ctggggaagaggacgtgctc ctgagcaaaa gcccatcctc actcagcgct aacatcatca 1080 gcagcccgaaaggttctcct tcttcatcaa gaaaaagtgg aaccagctgt ccctccagca 1140 aaaacagcagccctaatagc agcccacgga ctttggggag gagcaaaggg aggctccggc 1200 tgccccagattggcagcaaa aataaactgt caagtagtaa agagaacttg gatgccagca 1260 aagaaaatggggctgggcag atatgtgagc tggctgacgc cttgagtcga gggcatgtgc 1320 tggggggcagccaaccagag ttggtcactc ctcaggacca tgaggtagct ttggccaatg 1380 gattcctttatgagcatgaa gcatgtggca atggctacag caatggtcag cttggaaacc 1440 acagtgaagaagacagcact gatgaccaaa gagaagatac tcgtattaag cctatttata 1500 atctatatgcaatttcgtgc cattcaggaa ttctgggtgg gggccattac gtcacttatg 1560 ccaaaaacccaaactgcaag tggtactgtt acaatgacag cagctgtaag gaacttcacc 1620 cggatgaaattgacaccgac tctgcctaca ttcttttcta tgagcagcag gggatagact 1680 atgcacaatttctgccaaag actgatggca aaaagatggc agacacaagc agtatggatg 1740 aagactttgagtctgattac aaaaagtact gtgtgttaca gtaaagctac cactctggct 1800 gctagacagcttggcggtga gggagatgac tccttgtagc tgacatttgg caaaagcgtc 1860 actgaaaggcaagctaaatg tagttatttt atcctgtggc cctgaagcaa aaaataaaaa 1920 ttcgaattaa g1931 36 1218 DNA Homo sapiens misc_feature Incyte ID No 7479166CB1 36atgctcagcc ccccgcagcc caggacccct gactgtaggc tccaggcctc cctggaagcc 60ctggccacgc tcgccccgca gccctcagac tggctgtgct tcgcggatct tggctggttc 120gaggctgatg gagctgccca ctccatgggc ctgggcagca gcttgaagtg ggcgtgggcc 180aagccctctg ggatgcccgt cccagagaat gacctggtgg gcattgtggg gggccacaat 240gcccccccgg ggaagtggcc gtggcaggtc agcctgaggg tctacagcta ccactgggcc 300tcctgggcgc acatctgtgg gggctccctc atccaccccc agtgggtgct gactgctgcc 360cactgcattt tctggaagga caccgacccg tccatctacc ggatccacgc tggggacgtg 420tatctctacg ggggccgggg gctgctgaac gtcagccgga tcatcgtcca ccccaactat 480gtcactgcgg ggctgggtgc ggatgtggcc ctgctccagc tgccggggtc acctctctcc 540ccagagtcgc tgccgccgcc ctaccgcctg cagcaggcga gtgtgcaggt gctggagaac 600gccgtctgtg agcagcccta ccgcaacgcc tcagggcaca ctggcgaccg gcagctcatc 660ctggatgaca tgctgtgtgc cggcagcgag ggccgagact cctgctacgg tgactccggc 720ggccctctgg tctgcaggct gcgggggtcc tggcgcctgg tgggggtggt cagctggggc 780tacggctgta ccctgcggga ctttcccggc gtctacaccc acgtccagat ctacgtgctc 840tggatcctgc agcaagtcgg ggagttgccc tgagcaggct gggctgggct cccacctggg 900tcggctgagg agggaccagg accttcctcc tcccagcgat ctccgcttcg gcctccgctg 960caggccaccg tcttgagccc ggcttctctg gctcctcagc gcccaggacc tccctgatgc 1020cggggtgggg aaggggccgg ggaagggagg gtgggggcct cgctgcgtct ctgtctgatt 1080aaagagcaag agcagagtgt gtggcgtctc tgtgggatgg atttgcattc caagctgcag 1140ccaggtgcgg tttgctcagc cacctcctgt tggaggcctc cacattttgg ctatggtaat 1200aaagatgctg agaaaatt 1218 37 2679 DNA Homo sapiens misc_feature Incyte IDNo 3671788CB1 37 caattaatat taacgaggga aggctcctca ttgcctaaag accccactggggctccaatg 60 gaagagaggc cccgcccccg tgactcagag gttaaagggc ctggtgccggcttgtgaggc 120 cagtgtccag atggcatcca gcagtgggag ggtcaccatc cagctcgtggatgaggaggc 180 tggggtcgga gccgggcgcc tgcagctttt tcggggccag agctatgaggcaattcgggc 240 agcctgcctg gattcgggga tcctgttccg cgacccttac ttccctgctggccctgatgc 300 ccttggctat gaccagctgg ggccggactc ggagaaggcc aaaggcgtgaaatggatgag 360 gccccatgag ttctgtgctg agccgaagtt catctgtgaa gacatgagccgcacagacgt 420 gtgtcagggg agcctgggta actgctggtt ccttgcagcc gccgcctcccttactctgta 480 tccccggctc ctgcgccggg tggtccctcc tggacaggat ttccagcatggctacgcagg 540 cgtcttccac ttccagctct ggcagtttgg ccgctggatg gacgtcgtggtggatgacag 600 gctgcccgtg cgtgagggga agctgatgtt cgtgcgctcg gaacagcggaatgagttctg 660 ggccccactc ctggagaagg cctacgccaa gctccacggc tcctatgaggtgatgcgggg 720 cggccacatg aatgaggctt ttgtggattt cacaggcggc gtgggcgaggtgctctatct 780 gagacaaaac agcatggggc tgttctctgc cctgcgccat gccctggccaaggagtccct 840 cgtgggcgcc actgccctga gtgatcgggg tgagtaccgc acagaagagggcctggtaaa 900 gggacacgcg tattccatca cgggcacaca caaggtgttc ctgggcttcaccaaggtgcg 960 gctgctgcgg ctgcggaacc catggggctg cgtggagtgg acgggggcctggagcgacag 1020 ctgcccacgc tgggacacac tccccaccga gtgccgcgat gccctgctggtgaaaaagga 1080 ggatggcgag ttctggatgg agctgcggga cttcctcctc catttcgacaccgtgcagat 1140 ctgctcgctg agcccggagg tgctgggccc cagcccggag gggggcggctggcacgtcca 1200 caccttccaa ggccgctggg tgcgtggctt caactccggc gggagccagcctaatgctga 1260 aaccttctgg accaatcctc agttccgttt aacgctgctg gagcctgatgaggaggatga 1320 cgaggatgag gaagggccct gggggggctg gggggctgca ggggcacggggcccagcgcg 1380 ggggggccgc acgcccaagt gcacggtcct tctgtccctc atccagcgcaaccggcggcg 1440 cctgagagcc aagggcctca cttacctcac cgttggcttc cacgtgttccaggcagaggg 1500 ctccacaggc acagacaacg agcggacaca cggcttcacc ggacacagaggagcacagct 1560 cgccggtcac acacacggcc cacaagaggc gagcaaaaga tacacgcagaacagcgctga 1620 ggtagcccca gatagggaag cggacgacga cgggggacag gggttcggcgacgggccatg 1680 ggagatcgac gacgtgatca gcgcagacct gcagtctctc cagggcccctacctgcccct 1740 ggagctgggg ttggagcagc tgtttcagga gctggctgga gaggaggaagaactcaatgc 1800 ctctcagctc caggccttac taagcattgc cctggagcct gccagggcccatacctccac 1860 ccccagagag atcgggctca ggacctgtga gcagctgctg cagtgtttcgggcatgggca 1920 aagcctggcc ttacaccact tccagcagct ctggggctac ctcctggagtggcaggccat 1980 attcaacaag ttcgatgagg acacctctgg aaccatgaac tcctacgagctgaggctggc 2040 actgaatgca gcaggcttcc acctgaacaa ccagctgacc cagaccctcaccagccgcta 2100 ccgggatagc cgtctgcgtg tggacttcga gcggttcgtg tcctgtgtggcccacctcac 2160 ctgcatcttc tgccactgca gccagcacct ggatgggggt gagggggtcatctgcctgac 2220 ccacagacag tggatggagg tggccacctt ctcctaggat ctccggatgggcgcacctgc 2280 tgctcagggc agggttgctg agcaagacca cctccctagg ccttgcctggcatgggtgcc 2340 actctctctg gcatccacct gtctggggct agtctctggc cctcactgctcacggccggg 2400 tgaccactct ggcctgcgta ctcctcactc agaaacaaga acagcgacagcccttctcga 2460 gcagatgaca cgagctagtc cacgttgaca gcttaagaca gtgctagctctgccctggct 2520 ctcctagaag gtggaggaca gacacaggag aaataaaaaa agatgatgctgcaggaatcc 2580 ttcttaaaaa tattacatgt tttattatcc tgtccccaga gggtggtttatccagaaacc 2640 aagaaaaaaa atcaatcaga ataaactcaa aaaaaaaaa 2679 38 2632DNA Homo sapiens misc_feature Incyte ID No 7479181CB1 38 gggagagcctggcgagctga aacccgagct cccgctcagc tggggctcgg ggaggtccct 60 gtaaaacccgcctgcccccg gcctccctgg gtccctcctc tccctcccca gtagacgctc 120 gggcaccagccgcggcaagg atggagctgg gttgctggac gcagttgggg ctcacttttc 180 ttcagctccttctcatctcg tccttgccaa gagagtacac agtcattaat gaagcctgcc 240 ctggagcagagtggaatatc atgtgtcggg agtgctgtga atatgatcag attgagtgcg 300 tctgccccggaaagagggaa gtcgtgggtt ataccatccc ttgctgcagg aatgaggaga 360 atgagtgtgactcctgcctg atccacccag gttgtaccat ctttgaaaac tgcaagagct 420 gccgaaatggctcatggggg ggtaccttgg atgacttcta tgtgaagggg ttctactgtg 480 cagagtgccgagcaggctgg tacggaggag actgcatgcg atgtggccag gttctgcgag 540 ccccaaagggtcagattttg ttggaaagct atcccctaaa tgctcactgt gaatggacca 600 ttcatgctaaacctgggttt gtcatccaac taagatttgt catgttgagc ctggagtttg 660 actacatgtgccagtatgac tatgttgagg ttcgtgatgg agacaaccgc gatggccaga 720 tcatcaagcgtgtctgtggc aacgagcggc cagctcctat ccagagcata ggatcctcac 780 tccacgtcctcttccactcc gatggctcca agaattttga cggtttccat gccatttatg 840 aggagatcacagcatgctcc tcatcccctt gtttccatga cggcacgtgc gtccttgaca 900 aggctggatcttacaagtgt gcctgcttgg caggctatac tgggcagcgc tgtgaaaatc 960 cctgccgagaaccaaagatt tcagacctgg tgagaaggag agttcttccg atgcaggttc 1020 agtcaagggagacaccatta caccagctat actcagcggc cttcagcaag cagaaactgc 1080 agagtgcccctaccaagaag ccagcccttc cctttggaga tctgcccatg ggataccaac 1140 atctgcatacccagctccag tatgagtgca tctcaccctt ctaccgccgc ctgggcagca 1200 gcaggaggacatgtctgagg actgggaagt ggagtgggcg ggcaccatcc tgcatcccta 1260 tctgcgggaaaattgagaac atcactgctc caaagaccca agggttgcgc tggccgtggc 1320 aggcagccatctacaggagg accagcgggg tgcatgacgg cagcctacac aagggagcgt 1380 ggttcctagtctgcagcggt gccctggtga atgagcgcac tgtggtggtg gctgcccact 1440 gtgttactgacctggggaag gtcaccatga tcaagacagc agacctgaaa gttgttttgg 1500 ggaaattctaccgggatgat gaccgggatg agaagaccat ccagagccta cagatttctg 1560 ctatcattctgcatcccaac tatgacccca tcctgcttga tgctgacatc gccatcctga 1620 agctcctagacaaggcccgt atcagcaccc gagtccagcc catctgcctc gctgccagtc 1680 gggatctcagcacttccttc caggagtccc acatcactgt ggctggctgg aatgtcctgg 1740 cagacgtgaggagccctggc ttcaagaacg acacactgcg ctctggggtg gtcagtgtgg 1800 tggactcgctgctgtgtgag gagcagcatg aggaccatgg catcccagtg agtgtcactg 1860 ataacatgttctgtgccagc tgggaaccca ctgccccttc tgatatctgc actgcagaga 1920 caggaggcatcgcggctgtg tccttcccgg gacgagcatc tcctgagcca cgctggcatc 1980 tgatgggactggtcagctgg agctatgata aaacatgcag ccacaggctc tccactgcct 2040 tcaccaaggtgctgcctttt aaagactgga ttgaaagaaa tatgaaatga accatgctca 2100 tgcactccttgagaagtgtt tctgtatatc cgtctgtacg tgtgtcattg cgtgaagcag 2160 tgtgggcctgaagtgtgatt tggcctgtga acttggctgt gccagggctt ctgacttcag 2220 ggacaaaactcagtgaaggg tgagtagacc tccattgctg gtaggctgat gccgcgtcca 2280 ctactaggacagccaattgg aagatgccag ggcttgcaag aagtaagttt cttcaaagaa 2340 gaccatatacaaaacctctc cactccactg acctggtggt cttccccaac tttcagttat 2400 acgaatgccatcagcttgac cagggaagat ctgggcttca tgaggcccct tttgaggctc 2460 tcaagttctagagagctgcc tgtgggacag cccagggcag cagagctggg atgtggtgca 2520 tgcctttgtgtacatggcca cagtacagtc tggtcctttt ccttccccat ctcttgtaca 2580 cattttaataaaataagggt tggcttctga actacaaaaa aaaaaaaaaa aa 2632 39 2757 DNA Homosapiens misc_feature Incyte ID No 6621372CB1 39 atgccagggg gcgcaggcgccgccaggctc tgcttgctgg cgtttgccct gcagcccctc 60 cggccgcggg cggcgcgggagcctggatgg acaagaggaa gtgaggaagg cagccccaag 120 ctgcagcatg aacttatcatacctcagtgg aagacttcag aaagccccgt gagagaaaag 180 catccactca aagctgagctcagggtaatg gctgaggggc gagaactgat cctggacctg 240 gagaagaatg agcaactttttgctccttcc tacacagaaa cccattatac ttcaagtggt 300 aaccctcaaa ccaccacacggaaattggag gatcactgct tttaccacgg cacggtgagg 360 gagacagaac tgtccagcgtcacgctcagc acttgccgag gaattagagg actgattacg 420 gtgagcagca acctcagctacgtcatcgag cccctccctg acagcaaggg ccaacacctt 480 atttacagat ctgaacatctcaagccgccc ccgggaaact gtgggttcga gcactccaag 540 cccaccacca gggactgggctcttcagttt acacaacaga ccaagaagcg acctcgcagg 600 atgaaaaggg aagatttaaactccatgaag tatgtggagc tttacctcgt ggctgattat 660 ttagagtttc agaagaatcgacgagaccag gacgccacca aacacaagct catagagatc 720 gccaactatg ttgataagttttaccgatcc ttgaacatcc ggattgctct cgtgggcttg 780 gaagtgtgga cccacgggaacatgtgtgaa gtttcagaga atccatattc taccctctgg 840 tcctttctca gttggaggcgcaagctgctt gcccagaagt accatgacaa cgcccaatta 900 atcacgggca tgtccttccacggcaccacc atcggcctgg cccccctcat ggccatgtgc 960 tctgtgtacc agtctggaggagtcaacatg gaccactccg agaatgccat tggcgtggct 1020 gccaccatgg cccacgagatgggccacaac tttggcatga cccatgattc tgcagattgc 1080 tgctcggcca gtgcggctgatggtgggtgc atcatggcag ctgccactgg gcaccccttt 1140 cccaaagtgt tcaatggatgcaacaggagg gagctggaca ggtatctgca gtcaggtggt 1200 ggaatgtgtc tctccaacatgccagacacc aggatgttgt atggaggccg gaggtgtggg 1260 aacgggtatc tggaagatggggaagagtgt gactgtggag aagaagagga atgtaacaac 1320 ccctgctgca atgcctctaattgtaccctg aggccggggg cggagtgtgc tcacggctcc 1380 tgctgccacc agtgtaagctgttggctcct gggaccctgt gccgcgagca ggccaggcag 1440 tgtgacctcc cggagttctgtacgggcaag tctccccact gccctaccaa cttctaccag 1500 atggatggta ccccctgtgagggcggccag gcctactgct acaacggcat gtgcctcacc 1560 taccaggagc agtgccagcagctgtgggga cccggagccc gacctgcccc tgacctctgc 1620 ttcgagaagg tgaatgtggcaggagacacc tttggaaact gtggaaagga catgaatggt 1680 gaacacagga agtgcaacatgagagatgcg aagtgtggga agatccagtg tcagagctct 1740 gaggcccggc ccctggagtccaacgcggtg cccattgaca ccactatcat catgaatggg 1800 aggcagatcc agtgccggggcacccacgtc taccgaggtc ctgaggagga gggtgacatg 1860 ctggacccag ggctggtgatgactggaacc aagtgtggct acaaccatat ttgctttgag 1920 gggcagtgca ggaacacctccttctttgaa actgaaggct gtgggaagaa gtgcaatggc 1980 catggggtct gtaacaacaaccagaactgc cactgcctgc cgggctgggc cccgcccttc 2040 tgcaacacac cgggccacgggggcagtatc gacagtgggc ctatgccccc tgagagtgtg 2100 ggtcctgtgg tagctggagtgttggtggcc atcttggtgc tggcggtcct catgctgatg 2160 tactactgct gcagacagaacaacaaacta ggccaactca agccctcagc tctcccttcc 2220 aagctgaggc aacagttcagttgtcccttc agggtttctc agaacagcgg gactggtcat 2280 gccaacccaa ctttcaagctgcagacgccc cagggcaagc gaaaggtgat caacactccg 2340 gaaatcctgc ggaagccctcccagcctcct ccccggcccc ctccagatta tctgcgtggt 2400 gggtccccac ctgcaccactgccagctcac ctgagcaggg ctgctaggaa ctccccaggg 2460 cccgggtctc aaatagagaggacggagtcg tccaggaggc ctcctccaag ccggccaatt 2520 ccccccgcac caaattgcatcgtttcccag gacttctcca ggcctcggcc gccccagaag 2580 gcactcccgg caaacccagtgccaggccgc aggagcctcc ccaggccagg aggtgcatcc 2640 ccactgcggc cccctggtgctggccctcag cagtcccggc ctctggcagc acttgcccca 2700 aagtttccag aatacagatcacagagggct ggagggatga ttagctcgaa aatctag 2757 40 1892 DNA Homo sapiensmisc_feature Incyte ID No 4847254CB1 40 ttcttcaggt tgttcggccg ttgttctctgtgtgctccgt tctgggggtg tctttgtagt 60 cttggcctct gtttttcatg tgttgcgctctcgcctcgcg gcctcccttt cccgcgcccc 120 gtcgtcgtag tcctgctctg cctcttgctttgtcttcttc tgtatctttc tgcttcgttt 180 cctgtcttcg ttctctcatg tttctttcgtgctgccgtct tctcgctcgc gtcttctgtc 240 tctcgttctc gtcatgtttc tcttctcgtccccgtccctg tctcctgtct tcctcttgta 300 tctcctcctc ctctgcctct cctagaatctccctcgccct cgccccgctc ctccatgaac 360 tcgcacggca ccgtccccgc ctctccagaatcccccgtcc ccgcccccag aatctccccg 420 ccccgccccc agaacccccg ccccgcccccagaacccccg ccccgccccc agaacccccg 480 ccccgcgagg atgagcccag ggctccacggtccctaccta gaccccacgc gatccctcac 540 ctgagacccc gtcccacaca gccccagctggggcaaacag ccccctcccc acttcccatc 600 tgtaatttgc agggagatcg acgacgtgatcagcgcagac ctgcagtctc tccaggtggg 660 gactgttcct ggaggggcgg catggggcggggatcttggc cagcgctaaa cttccgccat 720 gcggcagggc ccctacctgc ccctggagctggggttggag cagctgtttc aggagctggc 780 tggagaggag gaagaactca atgcctctcagctccaggcc ttactaagca ttgccctgga 840 gcctgccagg gcccatacct ccacccccagagagatcggg ctcaggacct gtgagcagct 900 gctgcagtgt ttcggggtac atggggggcagtgcctgggt gagggaggga gtggggaagg 960 ggacgttggg gtctctcctc cccttctggagagattgacc ttaaccagat gcccccgacc 1020 cccaacacag catgggcaaa gcctggccttacaccacttc cagcagctct ggggctacct 1080 cctggagtgg caggccatat ttaacaagttcgatgaggac acctctggaa ccatgaactc 1140 ctacgagctg aggctggcac tgaatgcagcaggcttccac ctgaacaacc agctgaccca 1200 gaccctcacc agccgctacc gggatagccgtctgcgtgtg gacttcgagc ggttcgtgtc 1260 ctgtgtggcc cacctcacct gcatcttctgccactgcagc cagcacctgg atgggggtga 1320 gggggtcatc tgcctgaccc acagacagtggatggaggtg gccaccttct cctaggatct 1380 ccggatgggc gcacctgctg ctcagggcagggttgctgag caagaccacc tccctaggcc 1440 ttgcctggca tgggtgccac tctctctggcatccacctgt ctggggctag tctctggccc 1500 tcactgctca cggccgggtg accactctggcctgcgtact cctcactcag aaacaagaac 1560 agcgacagcc ccttctcgag cagatgacacgagctagtcc acgttgacag cttaagacag 1620 gtgctagctc tgcctggctc tcctagaaggtggaggacag acacgggaga aatacacaaa 1680 gatgaatgtt gccaggaatt ccttctttaaaatttcacca tgtgttatta tcctgtcccc 1740 agagggtggt ttatccagaa accaggaaaaaatcatccga taactccaaa aaaaaaaggg 1800 ggccgcgata tgggccggcg acgggaataaccggaccgac tgggcggggg gagatcaatc 1860 agcttggacc gcccgggggg cggccaatcctg 1892 41 3172 DNA Homo sapiens misc_feature Incyte ID No 5776350CB1 41atgaagctgg agccattaca agagcgtgag cccgcgccgg aggagaactt gacgtggagc 60agcagcggcg gcgacgagaa ggtgctccct tcaatccccc ttcgctgtca cagcagctcc 120tcgcccgttt gcccgcgccg caagccccgc cctcggcccc agccccgggc ccgctcccgc 180agccagcctg ggctctcggc cccacccccg cctccagccc ggcccccgcc cccgccgcca 240cccccgcccc cacccgcacc gcggcccagg gcctggcgtg gatcccggcg cagatcccgg 300cctgggtcca ggcctcagac acggagaagc tgctctggtg acctagacgg gtcgggggat 360cctggcggct taggggactg gttgctggaa gtcgagtttg gtcagggtcc cacaggctgc 420tctcatgtgg agagctttaa agtaggtaag aactggcaga agaacctgag gttgatctac 480cagcgtttcg tttggagtgg gaccccagag actaggaaac gtaaagcaaa gtcatgcatc 540tgtcacgtat gtagtaccca tatgaacaga ctccactctt gtctctcctg tgtctttttt 600ggctgcttca ctgagaaaca tattcacaaa catgcagaaa caaagcagca ccatttagct 660gtagaccttt atcatggggt catatattgc ttcatgtgta aggattatgt atatgacaaa 720gacatagaac agattgccaa agaaacaaaa gaaaaaattt tgagattatt aacttccacc 780tcaacagatg tttctcatca acagtttatg acatcagggt ttgaagacaa gcaatcaacc 840tgtgagacaa aggaacagga gccaaaattg gtgaaaccca agaaaaagag aagaaaaaag 900tcagtctata ctgtaggcct gagagggcta atcaatcttg ggaacacttg ttttatgaat 960tgtattgtcc aggcacttac ccatattcct ctactgaaag atttcttcct ctctgacaag 1020cacaaatgta taatgacaag ccccagcttg tgtctggtct gtgaaatgtc ttcgcttttt 1080catgctatgt actctgggag ccgaactcct cacattccct ataagttact gcatctgata 1140tggatccatg cagaacattt agcagggtac aggcagcagg atgcccatga gttccttatt 1200gcaatattag acgtgctaca tagacacagc aaagatgata gtggtgggca ggaggccaat 1260aaccccaact gctgtaactg catcatagac caaatcttta caggtggcct gcaatcagat 1320gtcacatgtc aagcctgcca tagtgtttct accaccatag acccatgctg ggacatcagt 1380ttggacttgc ctggctcttg tgccacattc gattcccaga acccagagag ggctgacagc 1440acagtgagca gggatgacca cataccagga atcccctcac ttacagactg tctacagtgg 1500tttacaaggc cagagcacct aggaagcagt gccaaaatca aatgcaatag ttgccaaagc 1560taccaggagt ctactaaaca gctcacaatg aaaaaattac ccattgtggc ttgttttcat 1620ctcaagcggt ttgagcatgt aggcaaacag aggcgaaaga ttaatacctt tatctccttt 1680cccttggagc tggacatgac tccgtttttg gcctctacta aagagagcag aatgaaagaa 1740ggccagccac caacagattg tgtgcccaat gagaataagt attccttgtt tgcagtgatt 1800aatcaccatg gaactttgga aagtggccac tataccagct tcatccggca acaaaaggac 1860cagtggttca gctgtgatga tgccatcatc accaaggcta ccattgagga cttactctac 1920agtgaagggt atttactgtt ctatcacaaa cagggtctag agaaagacta gtcttaccag 1980accacttact gaaaaaaaag taaatgatta ggcaaggatt ttgaagtgac acacagacct 2040acttggaatg gacaatgaca gtaacaccta tgtgacagct agtatcttga tataaagaac 2100ctattttagc atggcccatg ggtctgtcgg aagaaaaaaa tgaatactaa ccagtgacca 2160ttcaacctta agaaatgggg agagggagaa gaggttgaaa atggtcacat aaagcataat 2220gaaatgaaaa gaatgcttta ggtggggaca acgggagtag aagtgttctg atgctactct 2280atgtcatttg tttttacaga aatatcttgt gaagtcaggg agtattcctt tatcagcaaa 2340aacttcacaa ttggtgttcc agctgtggct gaccagctaa atagtttgaa agaaaaataa 2400tattttaaaa taaagtttaa agagctttaa aagaaaaaca tttaaaaagg aaaaaatcat 2460ttttaagatt ttaaaagaaa aaaactttta aatgttgaaa aaaatttaag ttgttatttt 2520taaaagaaat attttaaaag ttaaaaataa ttttttaatt taaagaagtt tcagaatttt 2580aaaaattaaa agcaaagaaa attaaattct taaagtttaa aaatgtaaaa taaattaagg 2640aacaaggtta aaaatgaaag tttaccaaaa aaaggaagaa aatactgtta aaaattaaag 2700ttagaaacaa aggaacatct taaaagtttc aaatgaagga ataatataaa tagatatttc 2760aaaattaaag cataaaatat acgtatttaa aaagtgttaa caaaattact actataatga 2820ttaagaaata aattttcaaa aatacagaat ggaatgcaat tcagatttta gagaaaagtt 2880ttaaaagagg caagtttaga ataattcaag acaaaaagac aaaatgtgtt taaagacaaa 2940aattgacaaa atacaggaag aaaatagaga cttgtaaaat aaaaagaacc ttagataagt 3000tcaagagatt taaatgaaaa ctttaaatat ttaaataaag atttaaaaat ttaagctttt 3060aaaaagaaaa acagttacat aaaaattgac cagtgaaaaa atgtgaaaga ttccagtaga 3120aaacattatt aaaattaaca ggtttaagag gtctattntt ttatttaagc at 3172 42 1997DNA Homo sapiens misc_feature Incyte ID No 7473300CB1 42 ataatccacccatggaacca atctgaaaag aatgcagtca gaccctggac ccagtctgaa 60 ggtgatgttctgcaaccttg gatctatgct gaaagcaata cagtcagact ctgggcccat 120 tctgaaactgataaaataaa acaatatact gagcctgaat ctcaagcaat taggatgtgg 180 cctgaagaggatatgttcgc actttggtcc ccaacacaaa acgatgcagt ttggccatgg 240 acccaagtggaatcacaaat gacccactcc tggacccaga atcaacttag tataaattac 300 ccttggactcagcatgtacc tgctgcaatc agaccatgga cttactctga aattcaaccc 360 tgcacccaccctgaagccaa tacagtgata agatactggt tccagactca aatgagttca 420 ttaaatcctgggaccaacct gaaactgaag tattccaaat ttggactatg ttgctcacac 480 aaagcctgtttgggggtctc ttcacacgga cacgtgagac agtttgtata tttcagccct 540 ggactcagcaaagagttact acaaatcgtt cgtggaccca ccctgaaacc caagcagaga 600 gactctggatcaagcaggaa actgaagata gagacagatc ttcgttttac attcaaatga 660 ataaaggcagaccatgggtt tatttgaaat atcaaatagt cggcgcctgg atccagcctg 720 aacttgatgtaattcactct tttatccagt ctgaaacctt cctattaaga ttctggccca 780 aggttctatctccagtagtc aaaccatgga tcttgcttaa aggaagaaca ctcatatctt 840 ggatactgcctgtaacccga gcagacactg gatccagtct gaagttcatc ttattgaatc 900 cttcggtgtttttaaagccg gcaaaccatc tgagtacctg ggaccgcagg cacacgctac 960 tgcatctggataattttgtt gttgttgttc ttgctgttga aagtcctgga attgtgcaaa 1020 aacggcacctgagcatccta caagtcagca cttgtgccca attttggctc aagctgaatg 1080 aactcactttctgggtggag gccaagaaag ccatgtggat ggctgactat cagggagtga 1140 cacagtctagctatgctccc tggtacaagc aagggcccat gactacctct gcttctatgt 1200 cccattcagtctctacctct acaaatgctt cagcttttac ctccacccct gcttctcttt 1260 ggccacacttctctctgcca cagcctcaga gtaaggctca aaaacttggt agagatcaga 1320 tttatctgcgatatgccatg ccttggaagg ctgtcatcat catctgtggg agtcagatct 1380 gcagtggttccatagttggc agctcttgga ttctcacagc tgcccactgt gtcaggaaac 1440 tcagggatcctgaagacact gctgtgatac tgggcctgag gcatcctggg gcaccactga 1500 gagttgtgaaggtgtctacc attctgctgc atgagagatt ctggttggtg actgaggcag 1560 caagaaatattctggaattg ctactcctcc acgatgtcca gactcccatt tggctcttat 1620 cactcttgggctatctgagg aacctgaata gttcagaatg ctggctctct aggccacata 1680 ttgttacaccagctgtcctg cttagacacc cctgggcccc agggggaccg caacctcacc 1740 caggcactggaccactccca cagattcagg ctcagcagcc taacctgcaa atccatcatg 1800 tagctcagcaggacttcatc atttgtgacc ctggtccata tctgggccca agtcttgagc 1860 accatgtgtttctgggctgg ctccccgcaa ccctgctcct gggacctagg cgcccacccc 1920 ctgctgccagccatcccgaa ttagcagctg cgaagacatg gctctggccc ggaaaccggg 1980 gatgccctgtggcttga 1997

What is claimed is:
 1. An isolated polypeptide selected from the groupconsisting of: a) a polypeptide comprising an amino acid sequenceselected from the group consisting of SEQ ID NO:1-21, b) a polypeptidecomprising a naturally occurring amino acid sequence at least 90%identical to an amino acid sequence selected from the group consistingof SEQ ID NO:1-21, c) a biologically active fragment of a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-21, and d) an immunogenic fragment of a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21.
 2. An isolated polypeptide of claim 1 selected from the groupconsisting of SEQ ID NO:1-21.
 3. An isolated polynucleotide encoding apolypeptide of claim
 1. 4. An isolated polynucleotide encoding apolypeptide of claim
 2. 5. An isolated polynucleotide of claim 4selected from the group consisting of SEQ ID NO:22-42.
 6. A recombinantpolynucleotide comprising a promoter sequence operably linked to apolynucleotide of claim
 3. 7. A cell transformed with a recombinantpolynucleotide of claim
 6. 8. A transgenic organism comprising arecombinant polynucleotide of claim
 6. 9. A method for producing apolypeptide of claim 1, the method comprising: a) culturing a cell underconditions suitable for expression of the polypeptide, wherein said cellis transformed with a recombinant polynucleotide, and said recombinantpolynucleotide comprises a promoter sequence operably linked to apolynucleotide encoding the polypeptide of claim 1, and b) recoveringthe polypeptide so expressed.
 10. An isolated antibody whichspecifically binds to a polypeptide of claim
 1. 11. An isolatedpolynucleotide selected from the group consisting of: a) apolynucleotide comprising a polynucleotide sequence selected from thegroup consisting of SEQ ID NO:22-42, b) a polynucleotide comprising anaturally occurring polynucleotide sequence at least 90% identical to apolynucleotide sequence selected from the group consisting of SEQ IDNO:22-42, c) a polynucleotide complementary to a polynucleotide of a),d) a polynucleotide complementary to a polynucleotide of b), and e) anRNA equivalent of a)-d).
 12. An isolated polynucleotide comprising atleast 60 contiguous nucleotides of a polynucleotide of claim
 11. 13. Amethod for detecting a target polynucleotide in a sample, said targetpolynucleotide having a sequence of a polynucleotide of claim 11, themethod comprising: a) hybridizing the sample with a probe comprising atleast 20 contiguous nucleotides comprising a sequence complementary tosaid target polynucleotide in the sample, and which probe specificallyhybridizes to said target polynucleotide, under conditions whereby ahybridization complex is formed between said probe and said targetpolynucleotide or fragments thereof, and b) detecting the presence orabsence of said hybridization complex, and, optionally, if present, theamount thereof.
 14. A method of claim 13, wherein the probe comprises atleast 60 contiguous nucleotides.
 15. A method for detecting a targetpolynucleotide in a sample, said target polynucleotide having a sequenceof a polynucleotide of claim 11, the method comprising: a) amplifyingsaid target polynucleotide or fragment thereof using polymerase chainreaction amplification, and b) detecting the presence or absence of saidamplified target polynucleotide or fragment thereof, and, optionally, ifpresent, the amount thereof.
 16. A composition comprising a polypeptideof claim 1 and a pharmaceutically acceptable excipient.
 17. Acomposition of claim 16, wherein the polypeptide has an amino acidsequence selected from the group consisting of SEQ ID NO:1-21.
 18. Amethod for treating a disease or condition associated with decreasedexpression of functional PRTS, comprising administering to a patient inneed of such treatment the composition of claim
 16. 19. A method forscreening a compound for effectiveness as an agonist of a polypeptide ofclaim 1, the method comprising: a) exposing a sample comprising apolypeptide of claim 1 to a compound, and b) detecting agonist activityin the sample.
 20. A composition comprising an agonist compoundidentified by a method of claim 19 and a pharmaceutically acceptableexcipient.
 21. A method for treating a disease or condition associatedwith decreased expression of functional PRTS, comprising administeringto a patient in need of such treatment a composition of claim
 20. 22. Amethod for screening a compound for effectiveness as an antagonist of apolypeptide of claim 1, the method comprising: a) exposing a samplecomprising a polypeptide of claim 1 to a compound, and b) detectingantagonist activity in the sample.
 23. A composition comprising anantagonist compound identified by a method of claim 22 and apharmaceutically acceptable excipient.
 24. A method for treating adisease or condition associated with overexpression of functional PRTS,comprising administering to a patient in need of such treatment acomposition of claim
 23. 25. A method of screening for a compound thatspecifically binds to the polypeptide of claim 1, said method comprisingthe steps of: a) combining the polypeptide of claim 1 with at least onetest compound under suitable conditions, and b) detecting binding of thepolypeptide of claim 1 to the test compound, thereby identifying acompound that specifically binds to the polypeptide of claim
 1. 26. Amethod of screening for a compound that modulates the activity of thepolypeptide of claim 1, said method comprising: a) combining thepolypeptide of claim 1 with at least one test compound under conditionspermissive for the activity of the polypeptide of claim 1, b) assessingthe activity of the polypeptide of claim 1 in the presence of the testcompound, and c) comparing the activity of the polypeptide of claim 1 inthe presence of the test compound with the activity of the polypeptideof claim 1 in the absence of the test compound, wherein a change in theactivity of the polypeptide of claim 1 in the presence of the testcompound is indicative of a compound that modulates the activity of thepolypeptide of claim
 1. 27. A method for screening a compound foreffectiveness in altering expression of a target polynucleotide, whereinsaid target polynucleotide comprises a sequence of claim 5, the methodcomprising: a) exposing a sample comprising the target polynucleotide toa compound, under conditions suitable for the expression of the targetpolynucleotide, b) detecting altered expression of the targetpolynucleotide, and c) comparing the expression of the targetpolynucleotide in the presence of varying amounts of the compound and inthe absence of the compound.
 28. A method for assessing toxicity of atest compound, said method comprising: a) treating a biological samplecontaining nucleic acids with the test compound; b) hybridizing thenucleic acids of the treated biological sample with a probe comprisingat least 20 contiguous nucleotides of a polynucleotide of claim 11 underconditions whereby a specific hybridization complex is formed betweensaid probe and a target polynucleotide in the biological sample, saidtarget polynucleotide comprising a polynucleotide sequence of apolynucleotide of claim 11 or fragment thereof; c) quantifying theamount of hybridization complex; and d) comparing the amount ofhybridization complex in the treated biological sample with the amountof hybridization complex in an untreated biological sample, wherein adifference in the amount of hybridization complex in the treatedbiological sample is indicative of toxicity of the test compound.
 29. Adiagnostic test for a condition or disease associated with theexpression of PRTS in a biological sample comprising the steps of: a)combining the biological sample with an antibody of claim 10, underconditions suitable for the antibody to bind the polypeptide and form anantibody:polypeptide complex; and b) detecting the complex, wherein thepresence of the complex correlates with the presence of the polypeptidein the biological sample.
 30. The antibody of claim 10, wherein theantibody is: a) a chimeric antibody, b) a single chain antibody, c) aFab fragment, d) a F(ab′)₂ fragment, or e) a humanized antibody.
 31. Acomposition comprising an antibody of claim 10 and an acceptableexcipient.
 32. A method of diagnosing a condition or disease associatedwith the expression of PRTS in a subject, comprising administering tosaid subject an effective amount of the composition of claim
 31. 33. Acomposition of claim 31, wherein the antibody is labeled.
 34. A methodof diagnosing a condition or disease associated with the expression ofPRTS in a subject, comprising administering to said subject an effectiveamount of the composition of claim
 33. 35. A method of preparing apolyclonal antibody with the specificity of the antibody of claim 10comprising: a) immunizing an animal with a polypeptide having an aminoacid sequence selected from the group consisting of SEQ ID NO:1-21, oran immunogenic fragment thereof, under conditions to elicit an antibodyresponse; b) isolating antibodies from said animal; and c) screening theisolated antibodies with the polypeptide, thereby identifying apolyclonal antibody which binds specifically to a polypeptide having anamino acid sequence selected from the group consisting of SEQ IDNO:1-21.
 36. An antibody produced by a method of claim
 35. 37. Acomposition comprising the antibody of claim 36 and a suitable carrier.38. A method of making a monoclonal antibody with the specificity of theantibody of claim 10 comprising: a) immunizing an animal with apolypeptide having an amino acid sequence selected from the groupconsisting of SEQ ID NO:1-21, or an immunogenic fragment thereof, underconditions to elicit an antibody response; b) isolating antibodyproducing cells from the animal; c) fusing the antibody producing cellswith immortalized cells to form monoclonal antibody-producing hybridomacells; d) culturing the hybridoma cells; and e) isolating from theculture monoclonal antibody which binds specifically to a polypeptidehaving an amino acid sequence selected from the group consisting of SEQID NO:1-21.
 39. A monoclonal antibody produced by a method of claim 38.40. A composition comprising the antibody of claim 39 and a suitablecarrier.
 41. The antibody of claim 10, wherein the antibody is producedby screening a Fab expression library.
 42. The antibody of claim 10,wherein the antibody is produced by screening a recombinantimmunoglobulin library.
 43. A method for detecting a polypeptide havingan amino acid sequence selected from the group consisting of SEQ IDNO:1-21 in a sample, comprising the steps of: a) incubating the antibodyof claim 10 with a sample under conditions to allow specific binding ofthe antibody and the polypeptide; and b) detecting specific binding,wherein specific binding indicates the presence of a polypeptide havingan amino acid sequence selected from the group consisting of SEQ IDNO:1-21 in the sample.
 44. A method of purifying a polypeptide having anamino acid sequence selected from the group consisting of SEQ ID NO:1-21from a sample, the method comprising: a) incubating the antibody ofclaim 10 with a sample under conditions to allow specific binding of theantibody and the polypeptide; and b) separating the antibody from thesample and obtaining the purified polypeptide having an amino acidsequence selected from the group consisting of SEQ ID NO:1-21.
 45. Apolypeptide of claim 1, comprising the amino acid sequence of SEQ IDNO:1.
 46. A polypeptide of claim 1, comprising the amino acid sequenceof SEQ ID NO:2.
 47. A polypeptide of claim 1, comprising the amino acidsequence of SEQ ID NO:3.
 48. A polypeptide of claim 1, comprising theamino acid sequence of SEQ ID NO:4.
 49. A polypeptide of claim 1,comprising the amino acid sequence of SEQ ID NO:5.
 50. A polypeptide ofclaim 1, comprising the amino acid sequence of SEQ ID NO:6.
 51. Apolypeptide of claim 1, comprising the amino acid sequence of SEQ IDNO:7.
 52. A polypeptide of claim 1, comprising the amino acid sequenceof SEQ ID NO:8.
 53. A polypeptide of claim 1, comprising the amino acidsequence of SEQ ID NO:9.
 54. A polypeptide of claim 1, comprising theamino acid sequence of SEQ ID NO:10.
 55. A polypeptide of claim 1,comprising the amino acid sequence of SEQ ID NO:11.
 56. A polypeptide ofclaim 1, comprising the amino acid sequence of SEQ ID NO:12.
 57. Apolypeptide of claim 1, comprising the amino acid sequence of SEQ IDNO:13.
 58. A polypeptide of claim 1, comprising the amino acid sequenceof SEQ ID NO:14.
 59. A polypeptide of claim 1, comprising the amino acidsequence of SEQ ID NO:15.
 60. A polypeptide of claim 1, comprising theamino acid sequence of SEQ ID NO:16.
 61. A polypeptide of claim 1,comprising the amino acid sequence of SEQ ID NO:17.
 62. A polypeptide ofclaim 1, comprising the amino acid sequence of SEQ ID NO:18.
 63. Apolypeptide of claim 1, comprising the amino acid sequence of SEQ IDNO:19.
 64. A polypeptide of claim 1, comprising the amino acid sequenceof SEQ ID NO:20.
 65. A polypeptide of claim 1, comprising the amino acidsequence of SEQ ID NO:21.
 66. A polynucleotide of claim 11, comprisingthe polynucleotide sequence of SEQ ID NO:22.
 67. A polynucleotide ofclaim 11, comprising the polynucleotide sequence of SEQ ID NO:23.
 68. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:24.
 69. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:25.
 70. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:26.
 71. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:27.
 72. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:28.
 73. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:29.
 74. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:30.
 75. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:31.
 76. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:32.
 77. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:33.
 78. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:34.
 79. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:35.
 80. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:36.
 81. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:37.
 82. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:38.
 83. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:39.
 84. A polynucleotide of claim 11, comprising thepolynucleotide sequence of SEQ ID NO:40.
 85. A polynucleotide of claim11, comprising the polynucleotide sequence of SEQ ID NO:41.
 86. Apolynucleotide of claim 11, comprising the polynucleotide sequence ofSEQ ID NO:42.